This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Support TBAA metadata in LLVMIR dialect.
ClosedPublic

Authored by vzakhari on Dec 29 2022, 4:11 PM.

Details

Summary

This change introduces new LLVMIR dialect operations to represent
TBAA root, type descriptor and access tag metadata nodes.

For the purpose of importing TBAA metadata from LLVM IR it only
supports the current version of TBAA format described in
https://llvm.org/docs/LangRef.html#tbaa-metadata (i.e. size-aware
representation introduced in D41501 is not supported).

TBAA attribute support is only added for LLVM::LoadOp and LLVM::StoreOp.
Support for intrinsics operations (e.g. LLVM::MemcpyOp) may be added later.

The TBAA attribute is represented as an array of access tags, though,
LLVM IR supports only single access tag per memory accessing instruction.
I implemented it as an array anticipating similar support in LLVM IR
to combine TBAA graphs with different roots for Flang - one of the options
described in https://docs.google.com/document/d/16kKZVmI585wth01VSaJAqZMZpoX68rcdBmgfj0kNAt0/edit#heading=h.jzzheaz9vqac

It should be easy to restrict MLIR operation to a single access tag,
if we end up using a different approach for Flang.

Diff Detail

Event Timeline

vzakhari created this revision.Dec 29 2022, 4:11 PM
vzakhari requested review of this revision.Dec 29 2022, 4:11 PM
ftynse requested changes to this revision.Dec 30 2022, 6:53 AM

This change deserves a short RFC due to its non-trivial design component. Specifically, the handling of potential cycles and the need for some symbol references to be fully qualified need to be described.

mlir/include/mlir/Dialect/LLVMIR/LLVMOpBase.td
55

Please prefix with llvm.

mlir/include/mlir/Target/LLVMIR/ModuleImport.h
155

Nit: add the doc comment.

234

Nit: we don't use Latex-style commands in MLIR documentation. Inline code can be added with Markdown-style backticks.

mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
2580–2593

Nit: we usually do something like

if (parser.parseLess() ||
    parser.parseAttribute(...) ||
    parser.parseComma())
  return failure();
...

to make the syntax more "visible" in the code.

2622

This verification ends up being very expensive because we keep retraversing the region of the symbol table op on every call to this function. The less expensive approach is to perform the verification of symbol references in the verifier of the symbol table op. That verifier can traverse the IR twice: the first time to collect all symbols (just use SymbolTable APIs), and the second time to find all references and check their validity.

mlir/lib/Target/LLVMIR/ModuleImport.cpp
53

Nit: Please document top-level entities

ftynse added inline comments.Dec 30 2022, 6:53 AM
mlir/lib/Target/LLVMIR/ModuleImport.cpp
342

Nit: flip the condition + use continue to decrease indentation below.

343

I find it strange that we will just ignore the metadata on some ops, or if it isn't present in the mapping. This is something that we should report IMO. Potentially as a warning because it's not fatal to the translation process.

359–361

Can't we have the builder of the MetadataOp do this instead?

360

I'm not sure we want to use ReturnOp as a terminator for metadata. It has the semantics of returning from the function with the associated control/data flow-related interfaces.

Also nit: can we use a better location than unknown? We know at least the file name.

376–377

Can't this just use SetVector and avoid the potential footgun of two parallel containers?

394

Nit: there's llvm::for_each. Also, a simple for loop with return in the middle is likely more concise here.

398

We typically want a test for any user-visible error message. I don't see one. Can this happen on valid IR? If not, it's better to turn this into an assert, eventually after emitting a diagnostic.

398–399

I wonder if we should just linearize (e.g., topological sort) the entire graph of MDNodes in the first sweep and then go over that list.
Don't remember offhand, is the graph expected to be a DAG or are cycles legit and just not supported (yet) here? Actually, do we care about the order of conversion? All references go through symbols, it should be okay to create a symbol after creating a reference to it, as long as all references are resolved in the end.

433–434

Nit: don't use specific numbers of stack elements in vectors unless you have a strong reason to pick up, 10 feels arbitrary here.

437

Nit: const auto *, I suppose. clang-tidy will complain.

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
1189–1191

I would emit a warning instead. Ignoring metadata completely is okay, ignoring it partially sounds like a bug and we should notify the user what we are doing.

1204

mlirModule->getBody().getOps<LLVM::MetadataOp>() is going to be much cheaper because it wouldn't traverse the module in depth, metadata ops can only appear at the top level in a module.

1206

Nit: no need to prefix SmallVector with llvm::. Also, drop the explicit number of stack elements.

1213

Nit: we prefer explicit nullptr for null pointers in MLIR.

1237

Same as above.

This revision now requires changes to proceed.Dec 30 2022, 6:53 AM
vzakhari marked 11 inline comments as done.Dec 30 2022, 6:10 PM

Hi @ftynse, I fixed most of the remarks, and will update the files shortly.

I will change the ModuleImport in a separate update.

mlir/include/mlir/Dialect/LLVMIR/LLVMOpBase.td
55

Yes, it seemed off to me that llvm prefix is not used for noalias_scopes, alias_scopes, parallel_access, options and access_groups as well.

The tablegen does not prefix the attribute names with llvm. because of the way the attributes are defined for operations, e.g.:

def LLVM_LoadOp : LLVM_Op<"load">, MemoryOpWithAlignmentAndAttributes {
  let arguments = (ins Arg<LLVM_PointerTo<LLVM_LoadableType>, "", [MemRead]>:$addr,
                   OptionalAttr<SymbolRefArrayAttr>:$access_groups,
                   OptionalAttr<SymbolRefArrayAttr>:$alias_scopes,
                   OptionalAttr<SymbolRefArrayAttr>:$noalias_scopes,
                   OptionalAttr<SymbolRefArrayAttr>:$tbaa,

Do you have a suggestion how to do this right? Should I create special LLVM attribute to represent an array of SymbolRefAttr's?

mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
2622

Thank you for the suggestion! I will do this as part of MetadataOp verification.

mlir/lib/Target/LLVMIR/ModuleImport.cpp
343

I added a warning for the missing mapping. It should never kick in, though, because this would mean some TBAA mapping logic is broken before this code is invoked.

Since setAAMetadata is invoked via mlirBuilder path, it will currently engage only for LoadOp and StoreOp, so adding a Default case with a warning will not help to report that the metadata is dropped for CallOp etc. And adding support for other operations in LLVMOps seems to be too much for this change.

359–361

I guess we can. I can do this in a separate commit, because I do not know how many changes it will involve in other places.

360

I agree. I think it makes sense to declare MetadataOp as SingleBlock and NoTerminator, but not as part of this change.

FWIW, unknown location is already used for "global" entities such as GlobalVariable, Constant, but I will use FileLineColLoc for convenience.

398–399

I believe the graph must not have cycles, at least TypeBasedAliasAnalysis verifies that there is no cycles in the Parent relation path https://github.com/llvm/llvm-project/blob/main/llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp#L494
From the language point of view the cycle would mean a type containing itself (struct s { struct s x; }).

I think you are right, and the order does not matter here, since we can create references before definitions, so I will just collect all reachable nodes and convert them without any cycle detection. I think we should check for cycles in the dialect verification code (during MetadataOp region verification). Does it sound right?

vzakhari updated this revision to Diff 485734.Dec 30 2022, 6:10 PM
vzakhari marked an inline comment as done.
gysit added a comment.Jan 2 2023, 6:06 AM

Thanks for adding tbaa support!

I am currently working on making the LLVM IR import extensible (https://reviews.llvm.org/D140374 and https://reviews.llvm.org/D140556). Especially the second revision has a bit of overlap with your work (it aims at making the metadata import extensible). For example, we may want to call the setAAMetadata function from LLVMDialectLLVMIRImportInterface::setMetadataAttrs. I think that can be done as a follow up once both revisions landed.

This is the corresponding rfc:
https://discourse.llvm.org/t/rfc-extensible-llvm-ir-import/67256

mlir/lib/Target/LLVMIR/ModuleImport.cpp
348

Are you sure the MLIR CallOp needs to be supported?

I may be wrong but I would assume LLVMIR CallInst supports the attribute due to the numerous memory access related intrinsics (e.g. memcpy or masked vector load/store). These are converted to actual MLIR operations rather than function calls. I would thus assume that these intrinsics plus probably the atomic operations have to be supported but not MLIR CallOp?

360

I think using FileLineColLoc with the file name is a good idea and if I am not mistaken it should be accessible via mlirModule->getLoc().

501

nit: quailified -> qualified.

511–513

nit: return emitError(loc) << ... directly returns failure.

vzakhari marked 2 inline comments as done.Jan 3 2023, 12:06 PM

Hi @gysit, thank you for the review and the links! I will rebase and see what additional changes are needed.

mlir/lib/Target/LLVMIR/ModuleImport.cpp
348

I looked through the cases where clang attaches MD_tbaa to instructions, and I think you are right: it only attaches it to loads/stores and memcpys currently. I cannot think of an example where Flang would attach TBAA to a generic call, so I removed the CallOp mentioning from the comment. Thanks!

360

Thanks!

vzakhari updated this revision to Diff 486047.Jan 3 2023, 12:08 PM
  • Changed ModuleImport to translate TBAA metadata as-is.
  • Moved cycles detection to LLVM dialect verification.
vzakhari edited the summary of this revision. (Show Details)Jan 3 2023, 12:09 PM
vzakhari updated this revision to Diff 486062.Jan 3 2023, 1:04 PM

This is a rebase with a minimal change for StoreOp's mlirBuilder to pass LIT testing.

More changes to follow D140556 will come next.

vzakhari updated this revision to Diff 486084.Jan 3 2023, 2:24 PM
  • Moved setAAMetadata call to setMetadataAttrs.

@gysit I suppose the TBAA conversion from LLVM IR to LLVMIR dialect operations should also be part of LLVMImportDialectInterface/LLVMDialectLLVMIRImportInterface, correct?

gysit added a comment.Jan 4 2023, 2:12 AM

@gysit I suppose the TBAA conversion from LLVM IR to LLVMIR dialect operations should also be part of LLVMImportDialectInterface/LLVMDialectLLVMIRImportInterface, correct?

I commented on the setAAMetadata method. I would probably extend ModuleImport with a method to lookup the tbaa metadata symbol given the metadata node and then move the code to set the attribute on the operations to the LLVMDialectLLVMIRImportInterface.

mlir/lib/Target/LLVMIR/ModuleImport.cpp
28

nit: I think this is not needed anymore?

334

I think I would move the method to LLVMIRToLLVMTranslation.cpp and have a method similar to setProfilingAttrs (e.g. setTBAAAttrs). The moduleImport could then have a lookupTBBAAttr similar to the other lookup methods used to query import mappings. That way the two metadata imports would be more similar and if an external user wants to extend the tbaa import for their custom intrinsic they can use the lookup.

Also the kind == llvm::LLVMContext::MD_tbaa check should not be needed anymore and you can probably just return failure and the extensible metadata import will generate a failure. I will use your diagMD helper to improve the error message generated once your change landed.

403

Would it make sense to do this with two consecutive loops over nodesToConvert instead of using the steps?

511–512

It may make sense to add a test case in import-failure.ll or similar file to verify the error messages are actually generated as expected.

vzakhari added inline comments.Jan 4 2023, 11:32 AM
mlir/lib/Target/LLVMIR/ModuleImport.cpp
28

Thanks! Removed.

334

Thanks! Done.

403

This will require duplicating the checks to recognize the "kind" of the node we are processing, because different kinds require different processing and I want to give them different base names. I guess I could have created a classification utility to host all the checks, but since this is a local code I decided to go with the two-steps loop.

511–512

Sounds right. I will add the tests.

vzakhari updated this revision to Diff 486350.Jan 4 2023, 11:34 AM
vzakhari updated this revision to Diff 486373.Jan 4 2023, 1:13 PM
  • Rebase to fix the patch application failure.
gysit added a comment.Jan 5 2023, 2:49 AM

This will require duplicating the checks to recognize the "kind" of the node we are processing, because different kinds require different processing and I want to give them different base names. I guess I could have created a classification utility to host all the checks, but since this is a local code I decided to go with the two-steps loop.

Yeah I find the while loop quite hard to understand, which admittedly is also due to the complexity of the metadata. I think your idea of having lambdas to check if a node is a particular kind (e.g. isDescriptorNode isTagNode) sounds nice!

Apart from that the revision LGTM and I only had a few nit comments. @ftynse do you plan to have a second look?

mlir/lib/Target/LLVMIR/ModuleImport.cpp
415

nit: (copying Alex's comment since it got lost) don't use specific numbers of stack elements in vectors unless you have a strong reason to pick up, 10 feels arbitrary here.

Thank you for the review, @gysit! I will make changes in processTBAAMetadata.

mlir/lib/Target/LLVMIR/ModuleImport.cpp
415

Thank you! I missed it.

ftynse accepted this revision.Jan 5 2023, 9:29 AM
ftynse added inline comments.
mlir/include/mlir/Dialect/LLVMIR/LLVMOpBase.td
55

Then we shouldn't have this attribute here. (I suppose same is true for the other unprefixed attributes). It's an inherent attribute of the op, it should be entirely managed by the op. The dialect-level attributes must be prefixed with the dialect namespace, and are managed (e.g., verified) by the dialect. Dialect-level attribute are typically intended to be attached to operations from other dialects.

I don't think you need a special attribute kind, just reshuffle the code a bit.

mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
2585–2587

Would it make sense to use SymbolRefAttr instead? Attributes are unique pointers, so we'd save the cost of hashing strings and trivially hash the pointer.

mlir/lib/Target/LLVMIR/ModuleImport.cpp
359–361

Ack.

360

I agree. I think it makes sense to declare MetadataOp as SingleBlock and NoTerminator, but not as part of this change.

Works for me, thanks.

398–399

Sounds reasonable. We can have a type containing a pointer to itself, and the LLVM dialect modeling will have a recursive Type object for that, hence the anticipation that it's better to check.

415

Looks like the new patch wasn't uploaded?

This revision is now accepted and ready to land.Jan 5 2023, 9:29 AM
vzakhari added inline comments.Jan 5 2023, 2:14 PM
mlir/include/mlir/Dialect/LLVMIR/LLVMOpBase.td
55

Please check if the latest patch does it right.

mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
2585–2587

Makes sense. Thanks!

mlir/lib/Target/LLVMIR/ModuleImport.cpp
415

Should be fixed now.

vzakhari updated this revision to Diff 486675.Jan 5 2023, 2:15 PM

Fixed tbaa attribute name + other clean-up changes.

vzakhari requested review of this revision.Jan 5 2023, 2:15 PM
vzakhari added inline comments.Jan 5 2023, 2:21 PM
mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
2585–2587

I suppose using StringAttr should be okay too. It just makes it easier to combine keys that are symbol definitions and symbol references in the same map.

gysit accepted this revision.Jan 5 2023, 11:49 PM

Thanks for the nice improvement!

This revision is now accepted and ready to land.Jan 5 2023, 11:49 PM

Thank you for the review, @ftynse and @gysit! I will go ahead with merging it and making follow-up changes for MetadataOp. Please feel free to add post-commit comments, if something looks off in the latest patch.

ftynse added inline comments.Jan 6 2023, 8:57 AM
mlir/include/mlir/Dialect/LLVMIR/LLVMOpBase.td
55

I was thinking the other way around: this becomes an inherent attribute of the ops, and the verifier is attached via an OpTrait. But the approach with dialect attribute also works. Just be aware that transforms may drop it (which sounds closer to LLVM IR in spirit).

vzakhari added inline comments.Jan 6 2023, 10:35 AM
mlir/include/mlir/Dialect/LLVMIR/LLVMOpBase.td
55

Oh, I thought you were actually in favor of the dialect attribute, since this discussion started with the missing llvm. prefix. I will leave it as-is for now.

To be on the same page, I think you suggest:

  • Keep using OptionalAttr<SymbolRefArrayAttr>:$tbaa argument for LoadOp, StoreOp, etc.
  • Add a TBAATrait with appropriate verification for the tbaa attribute.
  • Probably add TBAAAttrInteface to generalize code in TBAATrait::verifyTrait().

Did I catch it correctly?

This revision was automatically updated to reflect the committed changes.