This is an archive of the discontinued LLVM Phabricator instance.

Fix namespace for MLIR Async Runtime
ClosedPublic

Authored by mjp41 on Jan 25 2021, 11:59 AM.

Details

Summary

The MLIR Async runtime uses different namespacing for the header file, and the definitions of its C API. The header file places the extern "C" functions inside namespace mlir::runtime, and the definitions are not in a namespace. This causes issues in cl.exe. It treats the declaration and definition as different, and thus does not apply dllexport to the definition, which leads to the mlir_async_runtime.dll containing no definitions, and the mlir_async_runtime.lib not being generated.

This patch moves the namespace to cover the definitions, and thus generates the dll correctly on Windows with cl.exe.

This was tested with Visual Studio C++ 19.28.29336.

Diff Detail

Event Timeline

mjp41 created this revision.Jan 25 2021, 11:59 AM
mjp41 requested review of this revision.Jan 25 2021, 11:59 AM
mjp41 added a comment.EditedJan 25 2021, 12:06 PM

The following minimised example shows the Visual Studio behaviour

C++
namespace foo
{
    extern "C" __declspec(dllexport) int f();
    extern "C" __declspec(dllexport) int g();
}

extern "C" int f()
{
    return 42;
}

namespace foo
{
    extern "C" int g()
    {
        return 42;
    }
}

Compiling this with cl.exe file.cc /LD will generate a DLL, but only containing g.

mjp41 abandoned this revision.Jan 25 2021, 12:17 PM

Closing as code has changed.

mjp41 updated this revision to Diff 319251.Jan 26 2021, 2:31 AM
mjp41 added a comment.Jan 26 2021, 2:35 AM

I am not sure what the correct approach to this should be given the new code for the JIT. Do you need to dllexport the __mlir_runner_init code, so that it appears in the DLL? Do you actually want the other symbols in the dll, or just use the export_symbol in the __mlir_runner_init?

Sorry, I am not familiar with the development here. I am just trying to unblock our Windows CI from taking a more up-to-date commit.

This is the error:

CMake Error at D:/a/1/s/build/Release/mlir/install/lib/cmake/mlir/MLIRTargets.cmake:892 (message):
  The imported target "mlir_async_runtime" references the file

     "D:/a/1/s/build/Release/mlir/install/lib/mlir_async_runtime.lib"

  but this file does not exist.

From what I can see, there MLIRTargets has LLVMIR, and there's a conversion between Async to LLVM (with registration of all those functions declared in AsyncRuntime.h) which I'm assuming is why MLIRTargets needs the async runtime.

I don't understand enough about the Async runtime to assert that the LLVM IR target really needs it. It could be just a linking dependency, not a real code dependency.

Closing as code has changed.

I don't understand what you mean by that. Last change to this file is last year.

Also, you may have uploaded an unrelated change (clang format) to this review.

mjp41 updated this revision to Diff 319255.Jan 26 2021, 2:56 AM

Ideally the library should only export __mlir_runner_destroy symbol (see https://reviews.llvm.org/D94312). Unfortunately I don't have an easy access to Windows build env, so I didn't properly cleanup exports for all symbols in that patch.

mjp41 added a comment.Jan 26 2021, 6:15 AM

@ezhulenev I think I can see what the fix should be. Does the check-mlir cover this example? Or is there additional testing I should do?

mjp41 updated this revision to Diff 319515.Jan 27 2021, 3:10 AM

Updated to export the symbols as suggested by @ezhulenev.

@ezhulenev I think I can see what the fix should be. Does the check-mlir cover this example? Or is there additional testing I should do?

It doesn't seem like check-mlir covers this case (or we'd see this failing, as there are windows-mlir builds), so if it Verona builds with it and nothing else breaks, it should be "ok". :)

My Windows DLL knowledge is not good enough to suggest a specific test for this, and if it was, I don't think check-mlir is the right place, as this is a system issue.

Perhaps, now that MLIR is in the monorepo, we could have MLIR execution tests in the test-suite, but that's certainly for another patch. :)

Regarding the implementation, this patch certainly seems cleaner than previously, so I'm happy with it, but @ezhulenev should check and approve.

mjp41 added a comment.EditedJan 27 2021, 3:55 AM

@rengolin, I manually checked the DLL symbols contained __mlir_runner_init and __mlir_runner_destroy and nothing else. But I didn't know a command line that I could use to test if this what the context for loading this was expecting.

...\llvm-project>dumpbin /exports ...\llvm-project\build-win\bin\mlir_async_runtime.dll
Microsoft (R) COFF/PE Dumper Version 14.28.29336.0
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file ...\llvm-project\build-win\bin\mlir_async_runtime.dll

File Type: DLL

  Section contains the following exports for mlir_async_runtime.dll

    00000000 characteristics
    FFFFFFFF time date stamp
        0.00 version
           1 ordinal base
           2 number of functions
           2 number of names

    ordinal hint RVA      name

          1    0 00009B8D __mlir_runner_destroy = @ILT+35720(__mlir_runner_destroy)
          2    1 0000641A __mlir_runner_init = @ILT+21525(__mlir_runner_init)

  Summary

        1000 .00cfg
        7000 .data
        1000 .didat
        4000 .idata
       25000 .pdata
       A8000 .rdata
        5000 .reloc
        1000 .rsrc
      218000 .text
        1000 .tls

Looking at the disassembly, the body __mlir_runner_init it is calling a lambda with pointers to the internal functions (e.g. mlirAsyncRuntimeDropRef), and those function exist inside the dll, but are not exported.

theraven accepted this revision.Jan 27 2021, 4:03 AM

Assuming @ezhulenev's description of the desired behaviour is correct (which the comment on line 366 appears to support), this patch looks right. I don't think we have tests for the sets of exported symbols anywhere else, which is a bit unfortunate.

I'm not particularly happy about the definition of API here, we should have a generic LLVM macro for this, but we don't and that's a separate issue.

This revision is now accepted and ready to land.Jan 27 2021, 4:03 AM
ezhulenev accepted this revision.Jan 27 2021, 4:07 AM

Thanks for the cleanup!

check-mlir runs few async tests using this runtime API as a part of mlir-cpu-runner tests. Also there are few integration tests and benchmarks for async dialect:

-DLLVM_ENABLE_PROJECTS=mlir
-DLLVM_ENABLE_THREADS=ON 
-DMLIR_INCLUDE_INTEGRATION_TESTS=ON
-DBUILD_SHARED_LIBS=ON

So -DBUILD_SHARED_LIBS=ON is not supported on Windows. I had the same failures with and without this patch for MLIR integration tests. So I have done as much as I can to know this is not going to regress anything.

This patch is fine, but I'd still be interested to understand if this could be made to show up during check-mlir with one of the Cmake configuration that can run on Windows?

This patch is fine, but I'd still be interested to understand if this could be made to show up during check-mlir with one of the Cmake configuration that can run on Windows?

If that's possible, that'd be awesome, as we seem to have hit it twice already in our monthly LLVM update. :)

mjp41 added a comment.Jan 29 2021, 3:16 AM

So after a little fiddling, I manually ran a test:

> bin\mlir-opt.exe ..\mlir\integration_test\Dialect\Async\CPU\test-async-parallel-for-1d.mlir -async-parallel-for -async-ref-counting  -convert-async-to-llvm  -convert-scf-to-std  -convert-std-to-llvm | bin\mlir-cpu-runner.exe -e entry -entry-point-result=void -O0 -shared-libs=bin\mlir_async_runtime.dll -shared-libs=bin\mlir_runner_utils.dll | bin\FileCheck --dump-input=always ..\mlir\integration_test\Dialect\Async\CPU\test-async-parallel-for-1d.mlir

Input file: <stdin>
Check file: ..\mlir\integration_test\Dialect\Async\CPU\test-async-parallel-for-1d.mlir

-dump-input=help explains the following input dump.

Input was:
<<<<<<
   1: Unranked Memref base@ = 000001C41C5D9E60 rank = 1 offset = 0 sizes = [9] strides = [1] data =
   2: [0, 1, 2, 3, 4, 5, 6, 7, 8]
   3: Unranked Memref base@ = 000001C41C5D9E60 rank = 1 offset = 0 sizes = [9] strides = [1] data =
   4: [0, 0, 2, 0, 4, 0, 6, 0, 8]
   5: Unranked Memref base@ = 000001C41C5D9E60 rank = 1 offset = 0 sizes = [9] strides = [1] data =
   6: [-20, 0, 0, -17, 0, 0, -14, 0, 0]
>>>>>>

This seems to have worked correctly. The names of libraries needed the lib prefix dropping from the command line, e.g. -shared-libs=bin\mlir_async_runtime.dll. Also, two assertions were tripped in

...\llvm-project\llvm\lib\ExecutionEngine\Orc\Core.cpp

Lines 915-918

assert(Flags ==
           (SymI->second.getFlags() &
            ~(JITSymbolFlags::Weak | JITSymbolFlags::Common)) &&
    "Resolved flags should match the declared flags");

and Lines 2633-2634

assert((KV.second.getFlags() & ~WeakFlags) == (I->second & ~WeakFlags) &&
   "Resolving symbol with incorrect flags");

I locally disabled these to see to get the run above.

mjp41 added a comment.Feb 2 2021, 9:36 AM

Just to follow up. The tests for the ExecutionEngine as far as I can see do not run on Windows. Getting tests for the ExecutionEngine working on Windows seems like a larger undertaking, than I can currently do, and would require checking a lot of code I am not familiar with. My limited experiment seems to imply the changes I have made would help with making the MLIR execution engine work on Windows.

@rengolin, can you merge as is?

@rengolin, can you merge as is?

I'm happy merging it as is. @mehdi_amini any further concerns?

LG, my previous comment was more aspirational, I didn't intend to hold this revision :)

LG, my previous comment was more aspirational, I didn't intend to hold this revision :)

Awesome, thanks!

This revision was automatically updated to reflect the committed changes.

It looks like this broke the windows mlir bot. Please revert it or fix it ASAP:

http://lab.llvm.org:8011/#/builders/13/builds/4013

It looks like this broke the windows mlir bot. Please revert it or fix it ASAP:

http://lab.llvm.org:8011/#/builders/13/builds/4013

It's a 2017 bug fixed in the 2019 version.

Sorry about that. We're working on a fix, thanks!