The LLVM comdat operation specifies how to deduplicate globals with the
same key in two different object files. This is necessary on Windows
where e.g. two object files with linkonce globals will not link unless
a comdat for those globals is specified. It is also supported in the ELF
format.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I'm not 100% sure I've done this correctly as my understanding of comdats is somewhat limited. We need this for flang where we have some linkonce and linkonce_odr globals that we can't link on Windows without comdats.
I originally wanted to have a separate symbol table for comdats so that they could share the same symbol with the global they are defining the comdat for, as this is how it works in LLVM, but I couldn't quite get that to work. If we'd prefer it to work that way then any pointers on how to do that would be much appreciated!
I'm not 100% sure I've done this correctly as my understanding of comdats is somewhat limited. We need this for flang where we have some linkonce and linkonce_odr globals that we can't link on Windows without comdats. I originally wanted to have a separate symbol table for comdats so that they could share the same symbol with the global they are defining the comdat for, as this is how it works in LLVM, but I couldn't quite get that to work. If we'd prefer it to work that way then any pointers on how to do that would be much appreciated!
I would recommend to have a look at how the alias analysis metadata is handled in LLVM dialect:
llvm.metadata @__llvm_global_metadata { llvm.alias_scope_domain @domain_0 { description = "copy" } llvm.alias_scope @scope_1 { description = "copy: argument 0", domain = @domain_0 } } llvm.load %arg0 { alias_scopes = [@__llvm_global_metadata::@scope_1] }
It uses a global llvm.metadata operation that has a region which contains the individual metadata nodes. The llvm.metadata operation itself serves as a symbol table for the individual alias metadata operations. I.e., you could introduce an llvm.comdat operation that defines one global symbol under which you nest the comdat symbols for the individual global variables. Is this roughly what your separate symbol table approach looks like? I think having one global operation that nests all comdat information is cleaner than your current solution since there is only one symbol (the one of the llvm.comdat operation) that pollutes the global namespace.
mlir/include/mlir/Dialect/LLVMIR/LLVMEnums.td | ||
---|---|---|
617 | Is the DialectAttr wrapper needed? If the enum is only used as an operation attribute then the DialectAttr may not be necessary. AtomicOrdering is an example for an enum that does not need DialectAttr. If you plan to use the attribute as a discardable attribute outside of the LLVM dialect, then the DialectAttr wrapper may be needed. | |
mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td | ||
1319 | Please use a SymbolRef / FlatSymbolRef attribute when referencing symbols. | |
1428 | Could you use CArg<"StringRef", StringRef()> here and use the empty string reference instead of nullopt? | |
mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp | ||
41 | nit: This looks like a leftover? |
Thanks for the review! This is probably more like what I was looking for yes, I'll rework the patch to use something similar to the metadata op. Thanks for the pointer! I assume it's possible with llvm.metadata to add to it at multiple points, rather than the entire op having to be created at one place?
mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td | ||
---|---|---|
1428 | Ah, yes probably. This was a holdover from an initial prototype where I allowed to specify just that there was a comdat and that it would look up one with the same name as the function (as LLVM allows this) but I don't think that's really necessary and complicates things a bit. If anything we could just add a function to create a comdat with the same name anyway, but in the printed IR form have it always be explicit. |
Thanks for the review! This is probably more like what I was looking for yes, I'll rework the patch to use something similar to the metadata op. Thanks for the pointer! I assume it's possible with llvm.metadata to add to it at multiple points, rather than the entire op having to be created at one place?
Yes you can have multiple llvm.metadata operations. The import from LLVM IR currently generates only one llvm.metadata operation since every metadata operation you introduce somewhat pollutes the global namespace. We thus went for creating just one metadata operation with a long name that will not conflict with any other symbol. However, you can also generate IR with multiple such operations and it should lower correctly to LLVM IR. The same can be implemented for a possible new llvm.comdat operation.
You may also consider renaming llvm.metadata and make it multi-purpose so that it can contain metadata and comdat information. I currently have a slight preference for a separate llvm.comdat operation since this seem to be different things and since we may refactor metadata to be purely attribute based sometime in the future. However, I do not have a strong opinion and you are for sure deeper in the comdat topic than me :). Maybe comdat can be consider as some sort of metadata?
mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td | ||
---|---|---|
1428 |
Ah right I have seen this LLVM feature. I would probably prefer to make things explicit to avoid such special case handling. |
As another thought, one thing I am unclear on is what value there is, if any, in having multiple COMDAT sections with the same selection type. If it makes no difference, we could just have this be an attribute on global just like linkage is, and just give all any comdat globals the same LLVM comdat when lowering to LLVM IR. I suspect there's a reason that isn't how it works in LLVM though, but I'm not clear what it is
COMDAT has a key associated with it. Only when the key is the same, the linker will attempt deduplication.
While there might not be an immediate use case for this behaviour, not modelling the key will potentially break the LLVM to MLIR import.
COMDAT has a key associated with it. Only when the key is the same, the linker will attempt deduplication.
Yes that makes sense! Also note that deduplication kinds such as "exactmatch" or "largets" actually indicate that the grouping of different symbols into comdat sections matters.
Here are some interesting examples of C++ code that generate interesting comdats:
https://gcc.godbolt.org/z/EjGMxMcra -- uses the largest selection for the vtable in the MS C++ ABI (The symbol starting with "?_7" is the vtable)
https://gcc.godbolt.org/z/ETv9oPK95 -- Use of the D5 comdat group so that we can alias the D1 destructor to D2 to save code size
That should give you some real world examples of interesting comdats in LLVM IR that you might want to translate. I think there may be another use in PGO data, but I'm not sure.
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
---|---|---|
724 | I don't understand why you are appending the name of the selector here. The name of the LLVM comdat is significant and typically should exactly match some other global symbol. |
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
---|---|---|
724 | This isn't appending the name of the selector, it's appending the name of the symbol corresponding to the selector. Or, using an example: llvm.comdat @__comdat_region { llvm.comdat_selector @foo any } comes out in LLVM IR as comdat $__comdat_region_foo any we discussed above doing it this way so that both in the case of importing from LLVM IR, or generating from another language, can just make a single MLIR llvm.comdat region and put all the comdat symbols in there, to avoid polluting the global namespace. This is somewhat emulating the idea that in LLVM comdat symbols are in a separate symbol table with symbols prefixed by $ instead of @, but without making significant changes to MLIR to actually do that. We also discussed whether we should emulate the behaviour where they match the global symbol that they are specifying the comdat for and decided to be explicit here instead. Is there a specific reason that they should match the symbol or is that just the convention in LLVM? I may have fully misunderstood comdats, my knowledge is basically only driven by what we require for flang, so any more context would be really useful! |
Thanks for updating and thanks @rnk for the examples.
I think implementing the import of comdat's in ModuleImport.cpp could be a great opportunity to roundtrip the examples through MLIR back to LLVM IR. Ideally we should obtain more or less the same LLVM IR after the roundtrip.
mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td | ||
---|---|---|
1533–1534 | I think after the update description and summary should be update to reflect the new approach. | |
1552 | Similarly, the description and summary need to be updated here. | |
mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp | ||
1606 | Please add a test to invalid.mlir that verifies the error message is produced as expected. Additionally, the operations that have comdat's attached should also verify that the symbol references point to a valid comdat selector operation. I expect the implementation would be similar to verifySymbolRefsPointTo in LLVMInterfaces.cpp, which is used to verify the symbol references that point to metadata. | |
1607 | nit: the braces can be dropped here. | |
1886 | nit: the llvm:: prefix is most likely not needed here? | |
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
724 | if I understand @rnk's comment correctly the exported comdat should not have the comdat region prefix. That makes sense to me. We should probably drop the __comdat_region prefix when translating to LLVM IR. For your example, we thus would obtain the following MLIR: comdat $foo any The __comdat_region operation mainly exists to prevent naming conflicts between global symbols and comdat names since MLIR does not have a separate mechanism to model comdat names. The only drawback in doing so is that we should verify this does not produce conflicting comdat names since there could be multiple comdat operations containing the same name. We may want to introduce a verifier to prevent this and/or check there are no conflicts when translating to LLVM IR. | |
1414 | ComdatSelectorOp should not be a top-level operation? | |
mlir/test/Dialect/LLVMIR/comdat.mlir | ||
15 | ultra nit: please add the missing newline. |
Add LLVM IR import
mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp | ||
---|---|---|
1606 | Is what I have on line 1890 in this file not sufficient to check that? |
mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp | ||
---|---|---|
1606 |
I think that is sufficient. I would have expected two lookups are needed but I guess here one is ok since global op a module level thing. Adding a test will make sure it really works. Also if there are other operations, such as llvm.func, that can have comdat it may make to factor this out into an independent function? |
Thanks!
Can you add a test for the import as well? E.g. in a new /llvm-project/mlir/test/Target/LLVMIR/Import/comdat.ll file.
There are some older comments from ealier reviews that still need clarification. I tried to mark them. It also makes sense to resolve all the comments that have been addressed to make sure everything is catched.
Also two general coding guideline comments:
- please use camelBack naming for variables, for example cdop should be renamed to comdatOp.
- auto should be avoided except the type is explicit on the rhs of the expression or if the type is very long.
Can you make a pass to fix these. I think I marked most but not all instances.
mlir/include/mlir/Dialect/LLVMIR/LLVMEnums.td | ||
---|---|---|
617 | Any updates on this? | |
mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td | ||
1428 | Can you address to comment or is the optional still needed? | |
mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp | ||
1674 | nit: please rename to comdat or comdatAttr. | |
1745 | Can you rename to 'verifyComdat' to and hand in the comdat attribute directly? I would also move in `emitOpError() << "expected comdat symbol" into the function itself since the error message is not specific to GlobalOp. | |
1747 | nit: please rename to comdatOp | |
mlir/lib/Target/LLVMIR/ModuleImport.cpp | ||
85 ↗ | (On Diff #531268) | nit: can you add a comment similar to the one above |
559 ↗ | (On Diff #531268) | nit: rename to comdatOp or similar here and further below. |
567 ↗ | (On Diff #531268) | Can you put OpBuilder::InsertionGuard guard(builder); before builder.setInsertionPointToEnd(&cdregion.getBody().back());. |
890 ↗ | (On Diff #531268) | nit: please rename to comdat or similar. |
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
719 | Can you spell out the type here and below if the type is not overly complex. | |
722 | nit: please rename llcomdat to llvmComdat. | |
mlir/test/Dialect/LLVMIR/invalid.mlir | ||
1440 ↗ | (On Diff #531268) | ultra nit: please add a newline at the end of the file. |
mlir/test/Target/LLVMIR/llvmir-invalid.mlir | ||
256 ↗ | (On Diff #531268) | ultra nit: please add a newline at the end of the file. |
Also two general coding guideline comments:
- please use camelBack naming for variables, for example cdop should be renamed to comdatOp.
- auto should be avoided except the type is explicit on the rhs of the expression or if the type is very long.
Can you make a pass to fix these. I think I marked most but not all instances.
Sorry about that, I'm used to working on flang where the style is different.. I think I've caught all these now!
mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td | ||
---|---|---|
1428 | I couldn't get this to be representable as a StringRef when the SymbolRef must contain a nested symbol. That's why I reverted to optional here |
Thanks!
I have a few last comments.
mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td | ||
---|---|---|
1428 | I was thinking of CArg<"SymbolRefAttr", "{}">:$comdat, The implementation of the builder would then have to check if the comdat argument is non-null. | |
mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp | ||
26 | nit: I suspect that got added by the IDE and is probably not necessary. | |
1746–1748 | If you want to keep the comment braces are required. Comments should start upper case and end with a dot. However, in this case the comment seems not necessary. | |
1884–1885 | Please shorten to: if (failed(verifyComdat(getOperation(), getComdat())) return failure(); | |
mlir/lib/Target/LLVMIR/ModuleImport.cpp | ||
567 ↗ | (On Diff #531268) | Sorry for not being precise enough. You can now delete: builder.setInsertionPointAfter(comdat); since the insertion guard should make it redundant. |
mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td | ||
---|---|---|
1428 | Ah, I didn't realise that would just work. Thanks! |
LGTM!
I marked two more things that possibly can be deleted. One seems to be a leftover from the more complicated attribute definition. I am 90% sure it can go away :). Please give it a try before landing.
mlir/include/mlir/Dialect/LLVMIR/LLVMAttrs.h | ||
---|---|---|
69 | nit: I believe this can be dropped now? | |
mlir/test/Dialect/LLVMIR/invalid.mlir | ||
1425 ↗ | (On Diff #532147) | nit: this change seems unrelated? |
Thanks for all the help on this review! I made both changes you suggested above before landing
nit: I believe this can be dropped now?