This is an archive of the discontinued LLVM Phabricator instance.

ThinLTOBitcodeWriter: delete comdats if their keys are renamed
AbandonedPublic

Authored by inglorion on Apr 5 2017, 4:42 PM.

Details

Summary

COFF requires that every comdat contain a symbol with the same name as the comdat. ThinLTOBitcodeWriter renames symbols, which may cause this requirement to be violated. This change avoids such violations by deleting the affected comdats.

Event Timeline

inglorion created this revision.Apr 5 2017, 4:42 PM
pcc edited edge metadata.Apr 5 2017, 5:16 PM
pcc added a subscriber: rnk.

It isn't clear whether this does the right thing if the comdat contains associated members, only some of which need to be moved to the merged module. Although I believe that the frontend will never attach !type metadata to a global in a comdat with associated members, the asan globals feature could add an associated member to such a comdat.

An optimal fix would be to rename the comdat to match the symbol name and ensure that all comdat members are moved to the merged module if any member needs to be moved. But I think it would also be valid to remove the comdats from both modules, as this change (almost) does.

lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp
78

I think it would be simpler to do two passes over global_values, one where you decide which comdats to delete and the other where you delete them.

inglorion added inline comments.Apr 5 2017, 5:48 PM
lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp
78

Definitely simpler. Are you ok with iterating all the globals again? This more complex implementation tries to avoid some of the work by immediately deleting references to comdats already slated for deletion and indexing globals by comdat to avoid having to iterate over all the globals again (based on the assumption that most globals either won't have a comdat or the comdat will have the same name as the global).

Also, did you want me to iterate over the symbols in ImportM as well (as per your "remove the comdats from both modules")?

rnk added a comment.Apr 5 2017, 5:53 PM

Does clang generate internal globals with comdats that *don't* have other associated globals in that comdat? If so, that's probably a bug, but thinlto should still tolerate it.

Maybe thinlto is internalizing globals without removing their comdats?

mehdi_amini edited edge metadata.Apr 5 2017, 6:00 PM

COFF requires that every comdat contain a symbol with the same name as the comdat.

Is there a verifier for this? Also if I read correctly your test-case, the symbol @foo isn't part of the comdat $foo?
It isn't clear to me that this patch isn't working around a bug somewhere else.

rnk added a comment.Apr 5 2017, 6:29 PM
In D31735#719700, @pcc wrote:

It isn't clear whether this does the right thing if the comdat contains associated members, only some of which need to be moved to the merged module. Although I believe that the frontend will never attach !type metadata to a global in a comdat with associated members, the asan globals feature could add an associated member to such a comdat.

An optimal fix would be to rename the comdat to match the symbol name and ensure that all comdat members are moved to the merged module if any member needs to be moved. But I think it would also be valid to remove the comdats from both modules, as this change (almost) does.

I'd like to push to get the rename implemented. If we have an internal global with a comdat group, the frontend probably put it there for a good reason (the non-leader is probably metadata) and we shouldn't mess with it.

I hadn't considered that ThinLTO might try to import part of a comdat group, splitting the group. That seems like something we shouldn't do.

pcc added a comment.Apr 5 2017, 6:34 PM

Maybe thinlto is internalizing globals without removing their comdats?

This pass is being run in clang prior to internalization, which is done in the linker.

Also if I read correctly your test-case, the symbol @foo isn't part of the comdat $foo?

Aliases are part of the same comdat as their aliasee. The test case seems valid to me.

I'd like to push to get the rename implemented. If we have an internal global with a comdat group, the frontend probably put it there for a good reason (the non-leader is probably metadata) and we shouldn't mess with it.

If we remove comdat from an internal global, it will become a GC root. So we will handle it and its metadata correctly, but less optimally.


Actually now that I think about it, we would still have the asan globals problem for vtables with external linkage. So I suspect that we may have to move all members of externally visible comdats to the merged module if any member needs to be moved. That is orthogonal to this change because we would still need to handle internal comdats, for example by dropping them, as this change is doing.

lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp
78

Definitely simpler. Are you ok with iterating all the globals again?

Fine with me, modulo this being the right place for the fix.

Also, did you want me to iterate over the symbols in ImportM as well (as per your "remove the comdats from both modules")?

I think the best place to do it is in a separate function, where this function is responsible for collecting and the other one is responsible for removing. Basically the caller would do something like this:

std::set<Comdat *> ComdatsToRemove;
promoteInternals(*MergedM, M, ModuleId, ComdatsToRemove);
promoteInternals(M, *MergedM, ModuleId, ComdatsToRemove);

removeInternalComdats(M, ComdatsToRemove);
removeInternalComdats(*MergedM, ComdatsToRemove);
tejohnson edited edge metadata.Apr 5 2017, 8:03 PM
In D31735#719730, @rnk wrote:

I hadn't considered that ThinLTO might try to import part of a comdat group, splitting the group. That seems like something we shouldn't do.

Note that this isn't ThinLTO importing. This is splitting the module into two pieces (in clang) for the purposes of CFI and devirtualization, where one piece needs to use regular LTO module merging and the other uses ThinLTO style compilation.

In D31735#719738, @pcc wrote:

Actually now that I think about it, we would still have the asan globals problem for vtables with external linkage. So I suspect that we may have to move all members of externally visible comdats to the merged module if any member needs to be moved. That is orthogonal to this change because we would still need to handle internal comdats, for example by dropping them, as this change is doing.

I'm not familiar with the asan globals problem specifically, but yes, splitting and then removing the comdat would cause an issue if the comdat also contains any values with external linkage.

lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp
78

Agree with pcc, a two pass solution is cleaner and more readable.

@mehdi_amini: W.r.t. your question about verifying the requirements for comdats, this happens in Codegen/TargetLoweringObjectFileImpl.cpp for COFF, ELF, and Mach-O. They each have their own restrictions.

inglorion updated this revision to Diff 94437.Apr 6 2017, 2:10 PM

two-pass implementation

pcc added a comment.Apr 6 2017, 2:55 PM

Please add a test for the case where a comdat contains one member with !type metadata and another member without.

inglorion updated this revision to Diff 94453.Apr 6 2017, 4:04 PM

also test the case where the comdat leader has !type metadata and the other member doesn't

pcc added inline comments.Apr 6 2017, 4:35 PM
lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp
87

And because globals not in comdats are GC roots.

161

This can just enumerate global_objects instead of checking that the global is a GlobalObject.

test/Transforms/ThinLTOBitcodeWriter/comdat.ll
7–8

This isn't checking for the correct syntax for comdats.

18

remove ; before :

34–40

This appears to be one comdat with a single member (the leader) with metadata and another where the leader has metadata and the non-leader doesn't. Please add a non-leader to the former and change the latter so that the non-leader has metadata and the leader doesn't.

Please also give the globals less confusing names and group them by comdat.

There should also be a negative test showing that we keep the comdat if no global has type metadata.

inglorion marked 4 inline comments as done.Apr 11 2017, 4:47 PM

I've implemented more of the suggestions (keeping comdats together, renaming instead of deleting comdats) in D31963. If that diff looks good to you guys, I would like to land that one instead of this one.

pcc added a comment.Apr 11 2017, 4:57 PM

I've implemented more of the suggestions (keeping comdats together, renaming instead of deleting comdats) in D31963. If that diff looks good to you guys, I would like to land that one instead of this one.

Yes let's go with D31963, thanks for working on this!

inglorion abandoned this revision.Apr 11 2017, 6:47 PM