This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Add translation parameter to mlir-cpu-runner.
AbandonedPublic

Authored by springerm on Jan 30 2021, 1:26 AM.

Details

Summary

The current mlir-cpu-runner implementation always uses
translateModuleToLLVMIR. With this change, the translation can be specified
from via a command line parameter. That makes it possible to write integration
tests for other dialects such as LLVMAVX512_Dialect.

Depends On D95678.

Diff Detail

Event Timeline

springerm created this revision.Jan 30 2021, 1:26 AM
springerm requested review of this revision.Jan 30 2021, 1:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 30 2021, 1:26 AM
springerm retitled this revision from Add translation parameter to mlir-cpu-runner. to [mlir] Add translation parameter to mlir-cpu-runner..Jan 30 2021, 1:32 AM
rriddle requested changes to this revision.Jan 30 2021, 1:43 AM

I'm not sure I agree with the direction here. I don't think we should be hardcoding anything LLVM related in the Translation library.

This revision now requires changes to proceed.Jan 30 2021, 1:43 AM
springerm added a comment.EditedJan 30 2021, 2:02 AM

I see. I wanted to reuse some of the existing TranslationRegistry infrastructure, but I can it change so that the entire new logic is in mlir/lib/ExecutionEngine.

I'm not sure I agree with the direction here. I don't think we should be hardcoding anything LLVM related in the Translation library.

Atm I am not aware of MLIR translating to anything else that LLVM, is your thinking that the Translation library has/will be the supporting library for translating from MLIR to future targets?
In that case, should we start a new binary mlir-translate-to-llvmir that is LLVM-specific?
Or do you have something else in mind?

I'm not sure I agree with the direction here. I don't think we should be hardcoding anything LLVM related in the Translation library.

I agree, this patch is really not clear to me right now.
I'd rather revisit our general translation mechanism to LLVM and the specificity of mlir-cpu-runner than changing the fundamental of mlir-translate.

Atm I am not aware of MLIR translating to anything else that LLVM, is your thinking that the Translation library has/will be the supporting library for translating from MLIR to future targets?

In tree we have at least SPIRV on top of my head. Out-of-tree it is used for emitting graphdef, TFLite flat buffer, HLO proto, etc.

mlir/include/mlir/Target/LLVMIR.h
38

Why is this changing? (the doc still says "module")

How about changing the commit as follows:

  • In ConvertToLLVMIR.cpp / LLVMAVX512Intr.cpp, etc.: In addition to doing the TranslateFromMLIRRegistration, perform a second TranslateFromMLIRToLLVMRegistration that registers the corresponding translateModuleToLLVMIR / translateLLVMAVX512ModuleToLLVMIR function.
  • TranslateFromMLIRToLLVMRegistration is defined in lib/ExecutionEngine/Translation.cpp, so there are no changes to mlir-translate.
  • Define a new JitRunnerTranslationParser in lib/ExecutionEngine/Translation.cpp. Basically a copy of the TranslationParser of mlir-translate, but this one will be used only for mlir-cpu-runner.

This would avoid mixing mlir-translate with LLVM-specific stuff.

But it also duplicates the entire TranslationParser / Translate [...] Registration logic and translation passes in Target/LLVMIR/*.cpp will essentially be registered twice. Even though most TranslateFromMLIRRegistrations (apart from the SPIRV one) are just a wrapper around translate*ToLLVMIR, e.g.:

void registerAVX512ToLLVMIRTranslation() {
  TranslateFromMLIRRegistration reg(
      "avx512-mlir-to-llvmir",
      [](ModuleOp module, raw_ostream &output) {
        llvm::LLVMContext llvmContext;
        auto llvmModule = translateLLVMAVX512ModuleToLLVMIR(
            module, llvmContext, "LLVMDialectModule");
        if (!llvmModule)
          return failure();

        llvmModule->print(output, nullptr);
        return success();
      },
      [](DialectRegistry &registry) {
        registry.insert<LLVM::LLVMAVX512Dialect, LLVM::LLVMDialect>();
      });
}

Here, registerAVX512ToLLVMIRTranslation is just a wrapper around translateLLVMAVX512ModuleToLLVMIR (which calls LLVM::ModuleTranslation::translateModule). Actually, all of those functions in lib/Target/LLVMIR/*.cpp look the same, so there's a chance to remove some code duplication. (Which I attempted to do in TranslateFromMLIRToLLVMRegistration::TranslateFromMLIRToLLVMRegistration in this commit.)

Another approach would be to move the entire TranslationParser / Translate [...] Registration logic into a new component, which can be used by both mlir-cpu-runner and mlir-translate. Since they both perform "MLIR --> something" translations, it would make sense for them to share some code (such as the registration of available translations).

Any other ideas? What do you think?

springerm added inline comments.Jan 30 2021, 7:55 PM
mlir/include/mlir/Target/LLVMIR.h
38

Forgot to update the comment.

This function is inconsistent with the other translate*ToLLVMIR functions such as translateModuleToNVVMIR or translateLLVMArmSVEModuleToLLVMIR. These functions are all taking an Operation*.

By changing ModuleOP to Operation*, it was easier to register these functions with TranslationRegistry<TranslateMLIRToLLVMFunction> (now that they all have the same signature).

mlir/include/mlir/Target/LLVMIR.h
38

This seems more like an oversight on the the operation that are called translateXXXModuleYYY and take an operation ?
I'd go the other way around: make all of those take a ModuleOp.

The root of the problem is that we have multiple dialects that are involved in a direct translation to LLVM IR without going to the LLVM dialect itself. This is a departure from the original design where everything should be transformed into the LLVM dialect before translation (which ensures also proper round-tripping from LLVM IR to LLVM dialect).

Assuming we want to endorse this new paradigm, I'd look into making the general Translate MLIR To LLVM pluggable to dialects that provide direct conversion.
To do this, I'd keep a single entry point mlir-to-llvmir and remove all the others (avx512-mlir-to-llvmir should not exist). The implementation LLVM::ModuleTranslation::convertOperation could be final. The implementation can simply lookup the dialect for each operation and dispatch the conversion through a new interface.

This way we don't rely on any global static registration (other than the existing MLIRContext) for the translation, new Dialect can easily plug into the LLVM translation, including out-of-tree dialects.

ftynse added a comment.Feb 1 2021, 1:51 AM

The root of the problem is that we have multiple dialects that are involved in a direct translation to LLVM IR without going to the LLVM dialect itself. This is a departure from the original design where everything should be transformed into the LLVM dialect before translation (which ensures also proper round-tripping from LLVM IR to LLVM dialect).

Assuming we want to endorse this new paradigm, I'd look into making the general Translate MLIR To LLVM pluggable to dialects that provide direct conversion.
To do this, I'd keep a single entry point mlir-to-llvmir and remove all the others (avx512-mlir-to-llvmir should not exist). The implementation LLVM::ModuleTranslation::convertOperation could be final. The implementation can simply lookup the dialect for each operation and dispatch the conversion through a new interface.

This way we don't rely on any global static registration (other than the existing MLIRContext) for the translation, new Dialect can easily plug into the LLVM translation, including out-of-tree dialects.

I agree with this assessment. FWIW, we have de facto diverged from the original design some time ago. Specifically, when we decided to put all target-specific LLVM IR intrinsics into separate dialects (nvvm was actually the first, but avx, neon and sve are in the same situation). The reason for that is to avoid having thousands of unused ops in the main LLVM dialect with all the increased resource usage it entails. Then there is the OpenMP dialect, which uses OpenMPIRBuilder to emit loads of LLVM IR from a single MLIR operation bypassing the LLVM dialect. Furthermore, one may have OpenMP and, e.g., AVX simultaneously, so having -translate-something-to-llvm doesn't scale well because "something" may be a combination. I have been thinking about this in the background and my general inclination is to go towards more pluggability, similarly to what you suggest. Yes, making it easier to bypass the LLVM dialect when targeting LLVM IR may create wrong incentives, but it isn't that hard to create a new translation today.

There is one layering issue: if we want each dialect to know how to lower itself to LLVM IR, this creates a dependency from that dialect to LLVM IR core libraries and, potentially, to Translation. Arguably, this isn't something we want if we are just using the dialects, e.g., in mlir-opt. It took me significant time to get rid of the dependency on thread-unsafe LLVMContext in the LLVM dialect so I'd prefer not running a risk of putting that back in. I would be happy with some more dynamic approach.

ftynse added inline comments.Feb 1 2021, 1:54 AM
mlir/include/mlir/Target/LLVMIR.h
38

They can't. NVVM takes a gpu.module IIRC.

The root of the problem is that we have multiple dialects that are involved in a direct translation to LLVM IR without going to the LLVM dialect itself. This is a departure from the original design where everything should be transformed into the LLVM dialect before translation (which ensures also proper round-tripping from LLVM IR to LLVM dialect).

There is one layering issue: if we want each dialect to know how to lower itself to LLVM IR, this creates a dependency from that dialect to LLVM IR core libraries and, potentially, to Translation. Arguably, this isn't something we want if we are just using the dialects, e.g., in mlir-opt. It took me significant time to get rid of the dependency on thread-unsafe LLVMContext in the LLVM dialect so I'd prefer not running a risk of putting that back in. I would be happy with some more dynamic approach.

An additional layering to consider here is that some intrinsics (e.g. ARM) but not all (e.g. AVX512) are resolved in an overloaded fashion by the LLVM Builder API Intrinsic::getDeclaration.
I.e. one can't reliably construct the "intrinsic function name" to build a llvm.call without either calling the LLVM Builder API or copy-pasting the internals, both of which seem unacceptable.

ftynse added a comment.Feb 1 2021, 4:57 AM

The root of the problem is that we have multiple dialects that are involved in a direct translation to LLVM IR without going to the LLVM dialect itself. This is a departure from the original design where everything should be transformed into the LLVM dialect before translation (which ensures also proper round-tripping from LLVM IR to LLVM dialect).

There is one layering issue: if we want each dialect to know how to lower itself to LLVM IR, this creates a dependency from that dialect to LLVM IR core libraries and, potentially, to Translation. Arguably, this isn't something we want if we are just using the dialects, e.g., in mlir-opt. It took me significant time to get rid of the dependency on thread-unsafe LLVMContext in the LLVM dialect so I'd prefer not running a risk of putting that back in. I would be happy with some more dynamic approach.

An additional layering to consider here is that some intrinsics (e.g. ARM) but not all (e.g. AVX512) are resolved in an overloaded fashion by the LLVM Builder API Intrinsic::getDeclaration.
I.e. one can't reliably construct the "intrinsic function name" to build a llvm.call without either calling the LLVM Builder API or copy-pasting the internals, both of which seem unacceptable.

I don't think we want to lower ops that correspond to intrinsics to LLVM dialect function calls. So this is the same layering issue of the "intrinsic-derived" dialect having a dependency on LLVM libraries.

There is one layering issue: if we want each dialect to know how to lower itself to LLVM IR, this creates a dependency from that dialect to LLVM IR core libraries and, potentially, to Translation. Arguably, this isn't something we want if we are just using the dialects, e.g., in mlir-opt. It took me significant time to get rid of the dependency on thread-unsafe LLVMContext in the LLVM dialect so I'd prefer not running a risk of putting that back in. I would be happy with some more dynamic approach.

Ah this is an excellent point as well. This makes it tricky to rely on dialect interfaces, or we'd need maybe some injection into the Dialect registration for these dialects.

Let me try to rephrase, to make sure that I understood correctly.

To do this, I'd keep a single entry point mlir-to-llvmir and remove all the others (avx512-mlir-to-llvmir should not exist). The implementation LLVM::ModuleTranslation::convertOperation could be final. The implementation can simply lookup the dialect for each operation and dispatch the conversion through a new interface.

What you are suggesting is:

  • There is only one entry point in the code for converting to LLVMIR: LLVM::ModuleTranslation::translateModule()
  • We no longer override translateModule. No subclasses of LLVM::ModuleTranslation are needed and the entire ModuleTranslation class can be final. (However, there's one more subclass in Google3 that we have to think about.)
  • ModuleTranslation::convertOperation should be able to translate all kind of operations. This can be implemented similar to how OpenMP operations are handled at the moment: We check the operation dialect and then dispatch to the corresponding convert*Operation function:
if (opInst.getDialect() == ompDialect)
  return convertOmpOperation(opInst, builder);
  • All avx512-mlir-to-llvmir will be removed (for both mlir-cpu-runner and mlir-translate).

This way we don't rely on any global static registration (other than the existing MLIRContext) for the translation, new Dialect can easily plug into the LLVM translation, including out-of-tree dialects.

What is not clear to me, however, is how translateModule knows which dialects exist (as well as the pointers to their corresponding convert*Operation functions). I think we may still have to statically register each dialect that converts to LLVMIR in a map.

This way we don't rely on any global static registration (other than the existing MLIRContext) for the translation, new Dialect can easily plug into the LLVM translation, including out-of-tree dialects.

What is not clear to me, however, is how translateModule knows which dialects exist (as well as the pointers to their corresponding convert*Operation functions). I think we may still have to statically register each dialect that converts to LLVMIR in a map.

That is the part where I'd have used a Dialect Interface. Instead of:

if (opInst.getDialect() == ompDialect)
   return convertOmpOperation(opInst, builder);

We'd do:

if (auto *translateInterface = opInst.getDialect()->getRegisteredInterface<LLVMTranslateInterface>()))
   return translateInterface.convert(opInst, builder);

That way the LLVM dialect does not need to be taught about the others dialect and it can dispatch in a totally decoupled way.

The only problem here is the one @ftynse mentioned: in order to provide an implementation for the interface, the other dialects would have a build dependency to LLVM IR. Which not every MLIR clients using these dialects may want. Decoupling the interface from the dialect gets a bit more tricky.

The only problem here is the one @ftynse mentioned: in order to provide an implementation for the interface, the other dialects would have a build dependency to LLVM IR. Which not every MLIR clients using these dialects may want. Decoupling the interface from the dialect gets a bit more tricky.

I think that's already the case. There is the following dependency chain in the BUILD file: mlir-opt <- AllPassesAndDialectsNoRegistration <- LLVMDialect <- llvm:Core.

In general, it makes sense that we would want to avoid building LLVM stuff when just building mlir-opt. Of the top of my head, here's one way of separating LLVM dependencies from LLVMDialect/NVVMDialect/etc.: By having the LLVMTranslateInterface implementations in a build target separate from the dialect build target. In fact, there would be two new build targets that provide implementations of the LLVMTranslateInterface interface. One dummy target (with no LLVM dependencies) and the "real" target (with all the LLVM dependencies). mlir-opt would depend on (link) the dummy build target and mlir-translate would depend on the "real" target.

This would have to be designed carefully though, as accidentally including both dummy and real build target (or none at all) would result in a linker error.

The only problem here is the one @ftynse mentioned: in order to provide an implementation for the interface, the other dialects would have a build dependency to LLVM IR. Which not every MLIR clients using these dialects may want. Decoupling the interface from the dialect gets a bit more tricky.

I think that's already the case. There is the following dependency chain in the BUILD file: mlir-opt <- AllPassesAndDialectsNoRegistration <- LLVMDialect <- llvm:Core.

But only for the LLVM Dialects, I don't expect the other dialects you want to lower to LLVM to depend on LLVM Core right now.

In general, it makes sense that we would want to avoid building LLVM stuff when just building mlir-opt.

Actually mlir-opt is a testing/debugging/development tool, and the fact that it is bloated isn't a big deal. However the other libraries are intended to be able to be assembled in compilers ultimately, and preserving modularity is important :)

Of the top of my head, here's one way of separating LLVM dependencies from LLVMDialect/NVVMDialect/etc.

By having the LLVMTranslateInterface implementations in a build target separate from the dialect build target. In fact, there would be two new build targets that provide implementations of the LLVMTranslateInterface interface. One dummy target (with no LLVM dependencies) and the "real" target (with all the LLVM dependencies). mlir-opt would depend on (link) the dummy build target and mlir-translate would depend on the "real" target.

This would have to be designed carefully though, as accidentally including both dummy and real build target (or none at all) would result in a linker error.

I don't quite get how this would work though? At the moment interfaces are registered in a dialect constructor: you can build the dialect without the interfaces it defines.
We could make it so that you could register dialect interfaces separately / after the fact, but how would this work? The registration for the dialect is explicit, so would you have another function "registerAVX521TranslationInterface()" to add the interface and tools that want it would need to register the interface?

But only for the LLVM Dialects, I don't expect the other dialects you want to lower to LLVM to depend on LLVM Core right now.

And only because we haven't implemented the data layout yet :) There's nothing else that we use AFAIK.

I don't quite get how this would work though? At the moment interfaces are registered in a dialect constructor: you can build the dialect without the interfaces it defines.
We could make it so that you could register dialect interfaces separately / after the fact, but how would this work? The registration for the dialect is explicit, so would you have another function "registerAVX521TranslationInterface()" to add the interface and tools that want it would need to register the interface?

I think the idea was more along the lines of the following. The interface actually dispatches to something like LogicalResult translateLLVMDialectOperation(Operation *, TranslationState &). This function is only declared in the interface. It has two definitions. One in libDummyLLVMTranslation and is "implemented" as llvm_unrecheable. The other is in libLLVMTranslation and contains the actual implementation. The build scripts link libDummyLLVMTranslation for everything but mlir-translate and similar uses.

This sounds a bit fragile to me, so I was considering some sort of dependency injection. Like the interface has a setTranslationImpl(function_ref<LogicalResult (Operation *, TranslationState &)>) (strictly speaking, this is no longer an interface because it will have to have state). By default, the translation impl is function that always fails. Clients can override it after registering dialects to the actual translation function provided by a separate library. So, in JitRunnerMain.cpp, we can have context.getOrLoadDialect<LLVMDialect>()->getRegisteredInterface<LLVMTranslationInterface>().setTranslationImpl(llvmTranslation). Here, llvmTranslation is defined in a separate library that only the clients that need it will depend on. This sounds like too much indirection though, maybe LLVM ModuleTranslation can just have a map dialect -> function and dispatch calls itself without going through interfaces.

This sounds a bit fragile to me, so I was considering some sort of dependency injection. Like the interface has a setTranslationImpl(function_ref<LogicalResult (Operation *, TranslationState &)>) (strictly speaking, this is no longer an interface because it will have to have state). By default, the translation impl is function that always fails. Clients can override it after registering dialects to the actual translation function provided by a separate library. So, in JitRunnerMain.cpp, we can have context.getOrLoadDialect<LLVMDialect>()->getRegisteredInterface<LLVMTranslationInterface>().setTranslationImpl(llvmTranslation). Here, llvmTranslation is defined in a separate library that only the clients that need it will depend on. This sounds like too much indirection though, maybe LLVM ModuleTranslation can just have a map dialect -> function and dispatch calls itself without going through interfaces.

This requires to pre-load dialects though, which isn't desirable in general. It seems to me that these are all convoluted way to carry some "state", so why not use the existing "state" we have an register the interface on the dialect separately?

This requires to pre-load dialects though, which isn't desirable in general. It seems to me that these are all convoluted way to carry some "state", so why not use the existing "state" we have an register the interface on the dialect separately?

Is it just a matter of making Dialect::addInterfaces public?

This requires to pre-load dialects though, which isn't desirable in general. It seems to me that these are all convoluted way to carry some "state", so why not use the existing "state" we have an register the interface on the dialect separately?

Is it just a matter of making Dialect::addInterfaces public?

Yes, I'm in favor of doing this.

This requires to pre-load dialects though, which isn't desirable in general. It seems to me that these are all convoluted way to carry some "state", so why not use the existing "state" we have an register the interface on the dialect separately?

Is it just a matter of making Dialect::addInterfaces public?

Yes, I'm in favor of doing this.

Actually scratch that, that still requires a Dialect instance, it isn't that simple.

It looks like creating an interface required a dialect instance anyway, so we'd need some mechanism of delayed registration, presumably in the context, that takes a dialect TypeID and construction lambda, and registers an interface when and if the dialect is actually loaded.

Even this would require a context instance, which does not fit what's in mlir-cpu-runner.cpp right now:

int main(int argc, char **argv) {
  llvm::InitLLVM y(argc, argv);
  llvm::InitializeNativeTarget();
  llvm::InitializeNativeTargetAsmPrinter();
  llvm::InitializeNativeTargetAsmParser();
  mlir::initializeLLVMPasses();
  mlir::registerAllTranslations();

  return mlir::JitRunnerMain(argc, argv);
}

So we couldn't register translations in a similar fashion here I think, we'd have to go the same route as the dialect registry, with a separate "InterfaceRegistry" of some sort that can be injected into the Context the same way.

ftynse added a comment.EditedFeb 3 2021, 1:03 AM

Well, that does fit not because context creation is factored out into JitRunnerMain together with other code shared between between CPU and CUDA runners. I don't see a strong reason not to make JitRunnerMain take a context instead of creating it here https://github.com/llvm/llvm-project/blob/fae6d129dac27fa8c98c1680ac79af7b453ec7af/mlir/lib/ExecutionEngine/JitRunner.cpp#L333. (Also not registering all dialects, only those we actually need).

Actually, the mechanism I have in mind can be put into DialectRegistry, so we wouldn't even need a context, just the registry.

I think we can drop this now?

nicolasvasilache resigned from this revision.May 17 2021, 4:58 AM
springerm abandoned this revision.May 17 2021, 5:33 AM