This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Add a `GlobalSymbol` trait.
AbandonedPublic

Authored by fmorac on Jun 29 2023, 8:38 AM.

Details

Summary

For an explanation of these patches see D154153.

Commit message:
Currently there's no mechanism to indicate that a particular symbol should be
treated as a global symbol, this patch adds this trait and modifies the LLVM
translation mechanism to be aware of the trait and convert the operation
alongside all the other LLVM globals.

Diff Detail

Event Timeline

fmorac created this revision.Jun 29 2023, 8:38 AM
Herald added a reviewer: ftynse. · View Herald Transcript
Herald added a reviewer: dcaballe. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
fmorac updated this revision to Diff 536061.Jun 29 2023, 5:18 PM

Rebasing.

fmorac edited the summary of this revision. (Show Details)Jun 29 2023, 6:10 PM
fmorac edited the summary of this revision. (Show Details)Jun 29 2023, 6:15 PM
fmorac added a reviewer: mehdi_amini.
fmorac published this revision for review.Jun 30 2023, 6:27 AM

What does it mean for a symbol to have "global scope"?

What does it mean for a symbol to have "global scope"?

I don't have a good definition, but here's how I see the definition of global scope: if there were a Module trait, then a global scope Op means having a parent with the Module trait.

I don't understand what this is intended to model at the moment unfortunately, do you have an example of why this is needed?

I don't understand what this is intended to model at the moment unfortunately, do you have an example of why this is needed?

Yes, so I use it in the gpu.binary op in D154132. The reason I need it there is because the binary embedding should (ideally) be performed before converting the gpu.launch_func op, but without this trait the gpu.launch_func op would be converted first because it's inside a function body: https://github.com/llvm/llvm-project/blob/main/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp#L1411-L1446 .

For me it also makes sense as it allows translating non MLIR LLVM global ops in a given order, ie translating global ops before function bodies.
As to why I view the gpu.binary op as global op and not a binary string, it's because the embedding can be performed in multiple ways. The default MLIR way is just adding a string, however Clang Offload annotations embeds it as a string coupled with other annotations, a CUDA embedding would be the fat binary + the kernel stubs.

fmorac updated this revision to Diff 538786.Jul 10 2023, 1:07 PM

Rebasing.

krzysz00 added inline comments.
mlir/include/mlir/IR/SymbolTable.h
453

I don't think "global scope" is the right wording here, given what this trait is doing.

Heck, the name is one I'm iffy on, though I don't have any clearer suggestions.

But, in any case, how's this:

"Trait for operations that define symbols which, within the context of a module, define global values, whether variable or constant. These operations must also implement the Symbol trait and cannot produce results. Examples of such operations include llvm.global, memref.global, and gpu.binary.

This trait is not meant for operations that define functions, modules, or other values that have more complex internal structure."

And then, perhaps, applying the GlobalSymbol

fmorac updated this revision to Diff 543095.Jul 21 2023, 3:23 PM

Changed the trait's explanation.

I don't understand what this is intended to model at the moment unfortunately, do you have an example of why this is needed?

Yes, so I use it in the gpu.binary op in D154132.

So I there that you're adding it to the op, but can you also point me where it is used then?

fmorac marked an inline comment as done.EditedJul 24 2023, 4:25 AM

So I there that you're adding it to the op, but can you also point me where it is used then?

The only places that it's being used in C++ code is in this patch, Lines 811 & 1416 in ModuleTranslation.cpp. The sole purpose of the trait right now is allowing certain symbols to be translated alongside LLVM::GlobalOp and before translating all non-LLVM ops. Thus you only have to attach it to the trait list and the modifications in this patch take care of the rest.

So I there that you're adding it to the op, but can you also point me where it is used then?

The only places that it's being used in C++ code is in this patch, Lines 811 & 1416 in ModuleTranslation.cpp. The sole purpose of the trait right now is allowing certain symbols to be translated alongside LLVM::GlobalOp and before translating all non-LLVM ops. Thus you only have to attach it to the trait list and the modifications in this patch take care of the rest.

OK I see. I think @ftynse should chime in here about why this is needed: the translation code does not say why we phase things such that we first process "globals" and then functions. Is this because we need to have definitions before references? I would think that we could have a two-pass model: first translate the declaration for every global (that includes functions in LLVM terminology I believe), and then convert the definitions.

gysit added inline comments.Jul 25 2023, 12:29 AM
mlir/include/mlir/IR/SymbolTable.h
460

Is the trait marking all non-function global objects (= GlobalVariable in LLVM terminology https://llvm.org/doxygen/classllvm_1_1GlobalObject.html)? If yes, I would consider naming the trait GlobalVariable or GlobalVariableLike?

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
810

Should the existing global variable operations have the trait as well?

OK I see. I think @ftynse should chime in here about why this is needed: the translation code does not say why we phase things such that we first process "globals" and then functions. Is this because we need to have definitions before references? I would think that we could have a two-pass model: first translate the declaration for every global (that includes functions in LLVM terminology I believe), and then convert the definitions.

In the future I'd argue we should even consider introducing interfaces for translation, like having LLVMFuncTranslationInterface with methods for translating the signature and the body instead on only looking at LLVMFuncOp, that would allow non-LLVM dialects to have the same treatment as the LLVM dialect, instead of being relegated to be translated at the end, see: https://github.com/llvm/llvm-project/blob/main/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp#L1383-L1395 . But that's out of scope for this patch.

mlir/include/mlir/IR/SymbolTable.h
460

I avoided LLVM names because the trait could be applicable in non-LLVM contexts, however the name is open to change and I like GlobalVariableLike or GlobalSymbolLike.

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
810

That's a more difficult conversation , because LLVM globals are already handled a certain way, but I'd argue yes provided there are other refactors -see my comment non-inline comment.

To be clear: my previous comment is trying to find an alternative solution, as I'm not convinced by this trait as it is at the moment.

To be clear: my previous comment is trying to find an alternative solution, as I'm not convinced by this trait as it is at the moment.

Oh ok, I see what you mean, LLVM translation has the following pattern (ModuleTranslation.cpp):

if (failed(translator.convertFunctions()))
  return nullptr;

// Convert other top-level operations if possible.
llvm::IRBuilder<> llvmBuilder(llvmContext);
for (Operation &o : getModuleBody(module).getOperations()) {
  if (!isa<LLVM::LLVMFuncOp, LLVM::GlobalOp, LLVM::GlobalCtorsOp,
           LLVM::GlobalDtorsOp, LLVM::MetadataOp, LLVM::ComdatOp>(&o) &&
      !o.hasTrait<OpTrait::IsTerminator>() &&
      failed(translator.convertOperation(o, llvmBuilder))) {
    return nullptr;
  }
}

If we change it to:

// Convert other top-level operations if possible.
llvm::IRBuilder<> llvmBuilder(llvmContext);
for (Operation &o : getModuleBody(module).getOperations()) {
  if (!isa<LLVM::LLVMFuncOp, LLVM::GlobalOp, LLVM::GlobalCtorsOp,
           LLVM::GlobalDtorsOp, LLVM::MetadataOp, LLVM::ComdatOp>(&o) &&
      !o.hasTrait<OpTrait::IsTerminator>() &&
      failed(translator.convertOperation(o, llvmBuilder))) {
    return nullptr;
  }
}

if (failed(translator.convertFunctions()))
  return nullptr;

The issue for gpu.binary is solved, however I also don't know if that breaks something else (I think it shouldn't, as no op should depend on the body of a function).

Can you explain why the gpu.binary operation needs to be translated before the gpu.launch? Couldn't the gpu.launch conversion introduce an empty global variable that then is finalized once you translate the gpu.binary operation? The latter could then use getOrInsertGlobal to get and modify the global introduced during the conversion of the gpu.launch. This would make the translation robust against changing the order in which things are converted.

The issue for gpu.binary is solved, however I also don't know if that breaks something else (I think it shouldn't, as no op should depend on the body of a function).

I think I prefer moving the function conversion after the conversion of the non-LLVM dialect globals over introducing the trait, but I am unsure if this has undesired side-effects (I would assume no).

Can you explain why the gpu.binary operation needs to be translated before the gpu.launch? Couldn't the gpu.launch conversion introduce an empty global variable that then is finalized once you translate the gpu.binary operation? The latter could then use getOrInsertGlobal to get and modify the global introduced during the conversion of the gpu.launch. This would make the translation robust against changing the order in which things are converted.

The issue is that not all gpu.binary ops have to decay into simple strings, they could decay into a string and function stubs in a case of a CUDA like translation. Thus to avoid any potential issues with other kinds of translations it's better to always translate first gpu.binary.

I think I prefer moving the function conversion after the conversion of the non-LLVM dialect globals over introducing the trait, but I am unsure if this has undesired side-effects (I would assume no).

Abandoning in favor of D156284 (I'll wait until all the other patches get through before abandoning).

gysit added a comment.Jul 26 2023, 7:15 AM

The issue is that not all gpu.binary ops have to decay into simple strings, they could decay into a string and function stubs in a case of a CUDA like translation. Thus to avoid any potential issues with other kinds of translations it's better to always translate first gpu.binary.

Just out of curiosity, do you have an example in mind where that would be a problem? It sounds at first glance that it doesn't matter too much if gpu.binary lowers to one or more ops?

fmorac added a comment.EditedJul 26 2023, 8:04 AM

The issue is that not all gpu.binary ops have to decay into simple strings, they could decay into a string and function stubs in a case of a CUDA like translation. Thus to avoid any potential issues with other kinds of translations it's better to always translate first gpu.binary.

Just out of curiosity, do you have an example in mind where that would be a problem? It sounds at first glance that it doesn't matter too much if gpu.binary lowers to one or more ops?

Ok, before the example some context, GPU binaries don't have to come from gpu.modules they could come from an external source, eg. a CUDA binary compiled with nvcc.

If the current translation order remains unaltered then gpu.binary will always be translated after gpu.launch_func. Here's an scenario were that's a potential issue:

  • Lets say you have an external CUDA binary and you create a gpu.binary out of that external object.
  • In this scheme the ObjectManager takes a binary, embeds it into the IR and creates a function stub out of every kernel in the binary.
  • When translating gpu.launch_func the ObjectManager searches for the stub and creates the kernel launch call.

If you translate gpu.binary before gpu.launch_func then the above scheme works perfectly and you can verify that each gpu.launch_func is calling a real function because if the stub is not there then there's an error.

You could read the binary almost every time you encounter a launch_func create the stub and verify the kernel launch, however that's a waste of resources.

Also there could be more complicated schemes where is not practical translating the binary after the kernel launch.

gysit added a comment.Jul 26 2023, 8:11 AM

Thanks for the explanation. Yeah then reordering the translation steps may make sense in this case!

fmorac abandoned this revision.Jul 28 2023, 6:13 AM