This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Add support for LLVMIR comdat operation
ClosedPublic

Authored by DavidTruby on May 17 2023, 9:13 AM.

Details

Summary

The LLVM comdat operation specifies how to deduplicate globals with the
same key in two different object files. This is necessary on Windows
where e.g. two object files with linkonce globals will not link unless
a comdat for those globals is specified. It is also supported in the ELF
format.

Diff Detail

Event Timeline

DavidTruby created this revision.May 17 2023, 9:13 AM
Herald added a project: Restricted Project. · View Herald Transcript
DavidTruby requested review of this revision.May 17 2023, 9:13 AM

I'm not 100% sure I've done this correctly as my understanding of comdats is somewhat limited. We need this for flang where we have some linkonce and linkonce_odr globals that we can't link on Windows without comdats.
I originally wanted to have a separate symbol table for comdats so that they could share the same symbol with the global they are defining the comdat for, as this is how it works in LLVM, but I couldn't quite get that to work. If we'd prefer it to work that way then any pointers on how to do that would be much appreciated!

I'm not 100% sure I've done this correctly as my understanding of comdats is somewhat limited. We need this for flang where we have some linkonce and linkonce_odr globals that we can't link on Windows without comdats. I originally wanted to have a separate symbol table for comdats so that they could share the same symbol with the global they are defining the comdat for, as this is how it works in LLVM, but I couldn't quite get that to work. If we'd prefer it to work that way then any pointers on how to do that would be much appreciated!

I would recommend to have a look at how the alias analysis metadata is handled in LLVM dialect:

llvm.metadata @__llvm_global_metadata {
  llvm.alias_scope_domain @domain_0 {  description = "copy" }
  llvm.alias_scope @scope_1 { description = "copy: argument 0", domain = @domain_0 }
}

llvm.load %arg0 { alias_scopes = [@__llvm_global_metadata::@scope_1] }

It uses a global llvm.metadata operation that has a region which contains the individual metadata nodes. The llvm.metadata operation itself serves as a symbol table for the individual alias metadata operations. I.e., you could introduce an llvm.comdat operation that defines one global symbol under which you nest the comdat symbols for the individual global variables. Is this roughly what your separate symbol table approach looks like? I think having one global operation that nests all comdat information is cleaner than your current solution since there is only one symbol (the one of the llvm.comdat operation) that pollutes the global namespace.

mlir/include/mlir/Dialect/LLVMIR/LLVMEnums.td
617

Is the DialectAttr wrapper needed?

If the enum is only used as an operation attribute then the DialectAttr may not be necessary. AtomicOrdering is an example for an enum that does not need DialectAttr.

If you plan to use the attribute as a discardable attribute outside of the LLVM dialect, then the DialectAttr wrapper may be needed.

mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
1324

Please use a SymbolRef / FlatSymbolRef attribute when referencing symbols.

1433

Could you use CArg<"StringRef", StringRef()> here and use the empty string reference instead of nullopt?

mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
41

nit: This looks like a leftover?

Thanks for the review! This is probably more like what I was looking for yes, I'll rework the patch to use something similar to the metadata op. Thanks for the pointer! I assume it's possible with llvm.metadata to add to it at multiple points, rather than the entire op having to be created at one place?

mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
1433

Ah, yes probably. This was a holdover from an initial prototype where I allowed to specify just that there was a comdat and that it would look up one with the same name as the function (as LLVM allows this) but I don't think that's really necessary and complicates things a bit. If anything we could just add a function to create a comdat with the same name anyway, but in the printed IR form have it always be explicit.

gysit added a comment.May 23 2023, 4:08 AM

Thanks for the review! This is probably more like what I was looking for yes, I'll rework the patch to use something similar to the metadata op. Thanks for the pointer! I assume it's possible with llvm.metadata to add to it at multiple points, rather than the entire op having to be created at one place?

Yes you can have multiple llvm.metadata operations. The import from LLVM IR currently generates only one llvm.metadata operation since every metadata operation you introduce somewhat pollutes the global namespace. We thus went for creating just one metadata operation with a long name that will not conflict with any other symbol. However, you can also generate IR with multiple such operations and it should lower correctly to LLVM IR. The same can be implemented for a possible new llvm.comdat operation.

You may also consider renaming llvm.metadata and make it multi-purpose so that it can contain metadata and comdat information. I currently have a slight preference for a separate llvm.comdat operation since this seem to be different things and since we may refactor metadata to be purely attribute based sometime in the future. However, I do not have a strong opinion and you are for sure deeper in the comdat topic than me :). Maybe comdat can be consider as some sort of metadata?

mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
1433

Ah, yes probably. This was a holdover from an initial prototype where I allowed to specify just that there was a comdat and that it would look up one with the same name as the function

Ah right I have seen this LLVM feature. I would probably prefer to make things explicit to avoid such special case handling.

As another thought, one thing I am unclear on is what value there is, if any, in having multiple COMDAT sections with the same selection type. If it makes no difference, we could just have this be an attribute on global just like linkage is, and just give all any comdat globals the same LLVM comdat when lowering to LLVM IR. I suspect there's a reason that isn't how it works in LLVM though, but I'm not clear what it is

As another thought, one thing I am unclear on is what value there is, if any, in having multiple COMDAT sections with the same selection type. If it makes no difference, we could just have this be an attribute on global just like linkage is, and just give all any comdat globals the same LLVM comdat when lowering to LLVM IR. I suspect there's a reason that isn't how it works in LLVM though, but I'm not clear what it is

COMDAT has a key associated with it. Only when the key is the same, the linker will attempt deduplication.
While there might not be an immediate use case for this behaviour, not modelling the key will potentially break the LLVM to MLIR import.

COMDAT has a key associated with it. Only when the key is the same, the linker will attempt deduplication.

Yes that makes sense! Also note that deduplication kinds such as "exactmatch" or "largets" actually indicate that the grouping of different symbols into comdat sections matters.

DavidTruby updated this revision to Diff 527431.Jun 1 2023, 8:32 AM

Add comdat region to contain comdat selection operations

rnk added a comment.Jun 1 2023, 11:56 AM

Here are some interesting examples of C++ code that generate interesting comdats:
https://gcc.godbolt.org/z/EjGMxMcra -- uses the largest selection for the vtable in the MS C++ ABI (The symbol starting with "?_7" is the vtable)
https://gcc.godbolt.org/z/ETv9oPK95 -- Use of the D5 comdat group so that we can alias the D1 destructor to D2 to save code size

That should give you some real world examples of interesting comdats in LLVM IR that you might want to translate. I think there may be another use in PGO data, but I'm not sure.

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
734

I don't understand why you are appending the name of the selector here. The name of the LLVM comdat is significant and typically should exactly match some other global symbol.

DavidTruby added inline comments.Jun 1 2023, 2:59 PM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
734

This isn't appending the name of the selector, it's appending the name of the symbol corresponding to the selector. Or, using an example:

llvm.comdat @__comdat_region {
  llvm.comdat_selector @foo any
}

comes out in LLVM IR as

comdat $__comdat_region_foo any

we discussed above doing it this way so that both in the case of importing from LLVM IR, or generating from another language, can just make a single MLIR llvm.comdat region and put all the comdat symbols in there, to avoid polluting the global namespace. This is somewhat emulating the idea that in LLVM comdat symbols are in a separate symbol table with symbols prefixed by $ instead of @, but without making significant changes to MLIR to actually do that.

We also discussed whether we should emulate the behaviour where they match the global symbol that they are specifying the comdat for and decided to be explicit here instead. Is there a specific reason that they should match the symbol or is that just the convention in LLVM?

I may have fully misunderstood comdats, my knowledge is basically only driven by what we require for flang, so any more context would be really useful!

gysit added a comment.Jun 5 2023, 12:11 AM

Thanks for updating and thanks @rnk for the examples.

I think implementing the import of comdat's in ModuleImport.cpp could be a great opportunity to roundtrip the examples through MLIR back to LLVM IR. Ideally we should obtain more or less the same LLVM IR after the roundtrip.

mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
1538–1539

I think after the update description and summary should be update to reflect the new approach.

1557

Similarly, the description and summary need to be updated here.

mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
1621

Please add a test to invalid.mlir that verifies the error message is produced as expected.

Additionally, the operations that have comdat's attached should also verify that the symbol references point to a valid comdat selector operation. I expect the implementation would be similar to verifySymbolRefsPointTo in LLVMInterfaces.cpp, which is used to verify the symbol references that point to metadata.

1622

nit: the braces can be dropped here.

1935

nit: the llvm:: prefix is most likely not needed here?

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
734

if I understand @rnk's comment correctly the exported comdat should not have the comdat region prefix. That makes sense to me. We should probably drop the __comdat_region prefix when translating to LLVM IR. For your example, we thus would obtain the following MLIR:

comdat $foo any

The __comdat_region operation mainly exists to prevent naming conflicts between global symbols and comdat names since MLIR does not have a separate mechanism to model comdat names. The only drawback in doing so is that we should verify this does not produce conflicting comdat names since there could be multiple comdat operations containing the same name. We may want to introduce a verifier to prevent this and/or check there are no conflicts when translating to LLVM IR.

1419

ComdatSelectorOp should not be a top-level operation?

mlir/test/Dialect/LLVMIR/comdat.mlir
16

ultra nit: please add the missing newline.

Add LLVM IR import

mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
1621

Is what I have on line 1890 in this file not sufficient to check that?

DavidTruby marked 7 inline comments as done.Jun 12 2023, 8:13 AM
gysit added inline comments.Jun 12 2023, 9:07 AM
mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
1621

Is what I have on line 1890 in this file not sufficient to check that?

I think that is sufficient. I would have expected two lookups are needed but I guess here one is ok since global op a module level thing.

Adding a test will make sure it really works.

Also if there are other operations, such as llvm.func, that can have comdat it may make to factor this out into an independent function?

Add test for invalid comdat reference

gysit added a comment.Jun 15 2023, 4:03 AM

Thanks!

Can you add a test for the import as well? E.g. in a new /llvm-project/mlir/test/Target/LLVMIR/Import/comdat.ll file.

There are some older comments from ealier reviews that still need clarification. I tried to mark them. It also makes sense to resolve all the comments that have been addressed to make sure everything is catched.

Also two general coding guideline comments:

  • please use camelBack naming for variables, for example cdop should be renamed to comdatOp.
  • auto should be avoided except the type is explicit on the rhs of the expression or if the type is very long.

Can you make a pass to fix these. I think I marked most but not all instances.

mlir/include/mlir/Dialect/LLVMIR/LLVMEnums.td
617

Any updates on this?

mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
1433

Can you address to comment or is the optional still needed?

mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
1695

nit: please rename to comdat or comdatAttr.

1766

Can you rename to 'verifyComdat' to and hand in the comdat attribute directly? I would also move in

`emitOpError() << "expected comdat symbol"

into the function itself since the error message is not specific to GlobalOp.

1768

nit: please rename to comdatOp

mlir/lib/Target/LLVMIR/ModuleImport.cpp
85

nit: can you add a comment similar to the one above

561

nit: rename to comdatOp or similar here and further below.

569

Can you put

OpBuilder::InsertionGuard guard(builder);

before builder.setInsertionPointToEnd(&cdregion.getBody().back());.

892

nit: please rename to comdat or similar.

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
729

Can you spell out the type here and below if the type is not overly complex.

732

nit: please rename llcomdat to llvmComdat.

mlir/test/Dialect/LLVMIR/invalid.mlir
1440

ultra nit: please add a newline at the end of the file.

mlir/test/Target/LLVMIR/llvmir-invalid.mlir
256

ultra nit: please add a newline at the end of the file.

DavidTruby marked 16 inline comments as done.

Address misc review comments.

Also two general coding guideline comments:

  • please use camelBack naming for variables, for example cdop should be renamed to comdatOp.
  • auto should be avoided except the type is explicit on the rhs of the expression or if the type is very long.

Can you make a pass to fix these. I think I marked most but not all instances.

Sorry about that, I'm used to working on flang where the style is different.. I think I've caught all these now!

mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
1433

I couldn't get this to be representable as a StringRef when the SymbolRef must contain a nested symbol. That's why I reverted to optional here

gysit added a comment.Jun 16 2023, 7:07 AM

Thanks!

I have a few last comments.

mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
1433

I was thinking of

CArg<"SymbolRefAttr", "{}">:$comdat,

The implementation of the builder would then have to check if the comdat argument is non-null.

mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
26

nit: I suspect that got added by the IDE and is probably not necessary.

1767–1769

If you want to keep the comment braces are required. Comments should start upper case and end with a dot. However, in this case the comment seems not necessary.

1933–1934

Please shorten to:

if (failed(verifyComdat(getOperation(), getComdat()))
  return failure();
mlir/lib/Target/LLVMIR/ModuleImport.cpp
569

Sorry for not being precise enough. You can now delete:

builder.setInsertionPointAfter(comdat);

since the insertion guard should make it redundant.

Remove std::optional from SymbolRefAttr and address other comments

DavidTruby marked 4 inline comments as done.Jun 16 2023, 7:46 AM
DavidTruby added inline comments.
mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
1433

Ah, I didn't realise that would just work. Thanks!

gysit accepted this revision.Jun 16 2023, 8:07 AM

LGTM!

I marked two more things that possibly can be deleted. One seems to be a leftover from the more complicated attribute definition. I am 90% sure it can go away :). Please give it a try before landing.

mlir/include/mlir/Dialect/LLVMIR/LLVMAttrs.h
69

nit: I believe this can be dropped now?

mlir/test/Dialect/LLVMIR/invalid.mlir
1425

nit: this change seems unrelated?

This revision is now accepted and ready to land.Jun 16 2023, 8:07 AM
This revision was automatically updated to reflect the committed changes.
DavidTruby marked an inline comment as done.
DavidTruby added a comment.EditedJun 19 2023, 6:21 AM

LGTM!

Thanks for all the help on this review! I made both changes you suggested above before landing