COFF requires that every comdat contain a symbol with the same name as the comdat. ThinLTOBitcodeWriter renames symbols, which may cause this requirement to be violated. This change avoids such violations by deleting the affected comdats.
Details
Diff Detail
- Build Status
Buildable 5337 Build 5337: arc lint + arc unit
Event Timeline
It isn't clear whether this does the right thing if the comdat contains associated members, only some of which need to be moved to the merged module. Although I believe that the frontend will never attach !type metadata to a global in a comdat with associated members, the asan globals feature could add an associated member to such a comdat.
An optimal fix would be to rename the comdat to match the symbol name and ensure that all comdat members are moved to the merged module if any member needs to be moved. But I think it would also be valid to remove the comdats from both modules, as this change (almost) does.
lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp | ||
---|---|---|
79 | I think it would be simpler to do two passes over global_values, one where you decide which comdats to delete and the other where you delete them. |
lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp | ||
---|---|---|
79 | Definitely simpler. Are you ok with iterating all the globals again? This more complex implementation tries to avoid some of the work by immediately deleting references to comdats already slated for deletion and indexing globals by comdat to avoid having to iterate over all the globals again (based on the assumption that most globals either won't have a comdat or the comdat will have the same name as the global). Also, did you want me to iterate over the symbols in ImportM as well (as per your "remove the comdats from both modules")? |
Does clang generate internal globals with comdats that *don't* have other associated globals in that comdat? If so, that's probably a bug, but thinlto should still tolerate it.
Maybe thinlto is internalizing globals without removing their comdats?
COFF requires that every comdat contain a symbol with the same name as the comdat.
Is there a verifier for this? Also if I read correctly your test-case, the symbol @foo isn't part of the comdat $foo?
It isn't clear to me that this patch isn't working around a bug somewhere else.
I'd like to push to get the rename implemented. If we have an internal global with a comdat group, the frontend probably put it there for a good reason (the non-leader is probably metadata) and we shouldn't mess with it.
I hadn't considered that ThinLTO might try to import part of a comdat group, splitting the group. That seems like something we shouldn't do.
Maybe thinlto is internalizing globals without removing their comdats?
This pass is being run in clang prior to internalization, which is done in the linker.
Also if I read correctly your test-case, the symbol @foo isn't part of the comdat $foo?
Aliases are part of the same comdat as their aliasee. The test case seems valid to me.
I'd like to push to get the rename implemented. If we have an internal global with a comdat group, the frontend probably put it there for a good reason (the non-leader is probably metadata) and we shouldn't mess with it.
If we remove comdat from an internal global, it will become a GC root. So we will handle it and its metadata correctly, but less optimally.
Actually now that I think about it, we would still have the asan globals problem for vtables with external linkage. So I suspect that we may have to move all members of externally visible comdats to the merged module if any member needs to be moved. That is orthogonal to this change because we would still need to handle internal comdats, for example by dropping them, as this change is doing.
lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp | ||
---|---|---|
79 |
Fine with me, modulo this being the right place for the fix.
I think the best place to do it is in a separate function, where this function is responsible for collecting and the other one is responsible for removing. Basically the caller would do something like this: std::set<Comdat *> ComdatsToRemove; promoteInternals(*MergedM, M, ModuleId, ComdatsToRemove); promoteInternals(M, *MergedM, ModuleId, ComdatsToRemove); removeInternalComdats(M, ComdatsToRemove); removeInternalComdats(*MergedM, ComdatsToRemove); |
Note that this isn't ThinLTO importing. This is splitting the module into two pieces (in clang) for the purposes of CFI and devirtualization, where one piece needs to use regular LTO module merging and the other uses ThinLTO style compilation.
I'm not familiar with the asan globals problem specifically, but yes, splitting and then removing the comdat would cause an issue if the comdat also contains any values with external linkage.
lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp | ||
---|---|---|
79 | Agree with pcc, a two pass solution is cleaner and more readable. |
@mehdi_amini: W.r.t. your question about verifying the requirements for comdats, this happens in Codegen/TargetLoweringObjectFileImpl.cpp for COFF, ELF, and Mach-O. They each have their own restrictions.
Please add a test for the case where a comdat contains one member with !type metadata and another member without.
also test the case where the comdat leader has !type metadata and the other member doesn't
lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp | ||
---|---|---|
98 | And because globals not in comdats are GC roots. | |
182 | This can just enumerate global_objects instead of checking that the global is a GlobalObject. | |
test/Transforms/ThinLTOBitcodeWriter/comdat.ll | ||
8–9 | This isn't checking for the correct syntax for comdats. | |
19 | remove ; before : | |
35–41 | This appears to be one comdat with a single member (the leader) with metadata and another where the leader has metadata and the non-leader doesn't. Please add a non-leader to the former and change the latter so that the non-leader has metadata and the leader doesn't. Please also give the globals less confusing names and group them by comdat. There should also be a negative test showing that we keep the comdat if no global has type metadata. |
I've implemented more of the suggestions (keeping comdats together, renaming instead of deleting comdats) in D31963. If that diff looks good to you guys, I would like to land that one instead of this one.
I think it would be simpler to do two passes over global_values, one where you decide which comdats to delete and the other where you delete them.