This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse] supports sparse_tensor.pack on libgen path
ClosedPublic

Authored by Peiming on Aug 15 2023, 12:02 PM.

Diff Detail

Event Timeline

Peiming created this revision.Aug 15 2023, 12:02 PM
Herald added a project: Restricted Project. · View Herald Transcript
Peiming requested review of this revision.Aug 15 2023, 12:02 PM
Peiming retitled this revision from [mlir][sparse] supports sparse_tensor.pack on libgen path.q to [mlir][sparse] supports sparse_tensor.pack on libgen path.Aug 15 2023, 12:03 PM
Peiming added reviewers: yinying-lisa-li, bixia.
Peiming updated this revision to Diff 550437.Aug 15 2023, 12:52 PM

add gpu tests.

aartbik accepted this revision.Aug 15 2023, 12:53 PM
aartbik added inline comments.
mlir/include/mlir/Dialect/SparseTensor/IR/Enums.h
153

we could start using 1 again ;-)

mlir/include/mlir/ExecutionEngine/SparseTensor/Storage.h
654

strange edit?

1204

typo: coordinate

1217

yeah, a "swap" would be so awesome here but... maybe one day when we control runtime of external source too ;-)

mlir/lib/Dialect/SparseTensor/Transforms/CodegenUtils.cpp
665

I was about to comment that you can now remove the static util in SparseTensorCodegen.cpp
but you already did that ;-)

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorConversion.cpp
216

that point ... (since pointers is plural)

1314

strange edit?

This revision is now accepted and ready to land.Aug 15 2023, 12:53 PM
Peiming updated this revision to Diff 550439.Aug 15 2023, 12:56 PM

fix asan.

Peiming updated this revision to Diff 550441.Aug 15 2023, 1:02 PM

revert unintended changes.

Peiming updated this revision to Diff 550442.Aug 15 2023, 1:04 PM
Peiming marked 5 inline comments as done.

address comments.

This revision was landed with ongoing or failed builds.Aug 15 2023, 1:21 PM
This revision was automatically updated to reflect the committed changes.

Since this change we have a test failing with a double free on our AArch64 bots: https://lab.llvm.org/buildbot/#/builders/179/builds/7122

This also occurs on our SVE bots, but they were failing to build for a time which masked the issue. The difference is that on the SVE bots the test runs natively, it's using QEMU otherwise (https://lab.llvm.org/buildbot/#/builders/184/builds/5631). Which means the failure isn't down to QEMU.

The 2 stage SVE bots aren't failing, but that's because they don't run the tests. You need to pass -DMLIR_INCLUDE_INTEGRATION_TESTS=True -DMLIR_RUN_ARM_SVE_TESTS=True to CMake to enable them.

ASAN results:

=================================================================
==3831480==ERROR: AddressSanitizer: attempting double-free on 0xffff956192a0 in thread T0:
    #0 0xaaaac7242ec4 in __interceptor_free /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-jade-03/final/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:52:3
    #1 0xffff9990d314  (<unknown module>)
    #2 0xffff9990d384  (<unknown module>)
    #3 0xaaaac7f2efc8 in compileAndExecute((anonymous namespace)::Options&, mlir::Operation*, llvm::StringRef, (anonymous namespace)::CompileAndExecuteConfig, void**, std::unique_ptr<llvm::TargetMachine, std::default_delete<llvm::TargetMachine>>) /home/david.spickett/llvm-project/mlir/lib/ExecutionEngine/JitRunner.cpp:215:3
    #4 0xaaaac7f28754 in compileAndExecuteVoidFunction((anonymous namespace)::Options&, mlir::Operation*, llvm::StringRef, (anonymous namespace)::CompileAndExecuteConfig, std::unique_ptr<llvm::TargetMachine, std::default_delete<llvm::TargetMachine>>) /home/david.spickett/llvm-project/mlir/lib/ExecutionEngine/JitRunner.cpp:234:10
    #5 0xaaaac7f23c28 in mlir::JitRunnerMain(int, char**, mlir::DialectRegistry const&, mlir::JitRunnerConfig) /home/david.spickett/llvm-project/mlir/lib/ExecutionEngine/JitRunner.cpp:391:23
    #6 0xaaaac72727ec in main /home/david.spickett/llvm-project/mlir/tools/mlir-cpu-runner/mlir-cpu-runner.cpp:33:10
    #7 0xffff9b140e0c in __libc_start_main /build/glibc-RIFKjK/glibc-2.31/csu/../csu/libc-start.c:308:16
    #8 0xaaaac71cc414 in _start (/home/david.spickett/build-llvm-aarch64/bin/mlir-cpu-runner+0x1afe414)

0xffff956192a0 is located 0 bytes inside of 88-byte region [0xffff956192a0,0xffff956192f8)
freed by thread T0 here:
    #0 0xaaaac7242ec4 in __interceptor_free /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-jade-03/final/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:52:3
    #1 0xffff9990d2c4  (<unknown module>)

previously allocated by thread T0 here:
    #0 0xaaaac7243158 in malloc /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-jade-03/final/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:69:3
    #1 0xffff9990d024  (<unknown module>)
    #2 0xaaaac7f28754 in compileAndExecuteVoidFunction((anonymous namespace)::Options&, mlir::Operation*, llvm::StringRef, (anonymous namespace)::CompileAndExecuteConfig, std::unique_ptr<llvm::TargetMachine, std::default_delete<llvm::TargetMachine>>) /home/david.spickett/llvm-project/mlir/lib/ExecutionEngine/JitRunner.cpp:234:10
    #3 0xaaaac7f23c28 in mlir::JitRunnerMain(int, char**, mlir::DialectRegistry const&, mlir::JitRunnerConfig) /home/david.spickett/llvm-project/mlir/lib/ExecutionEngine/JitRunner.cpp:391:23
    #4 0xaaaac72727ec in main /home/david.spickett/llvm-project/mlir/tools/mlir-cpu-runner/mlir-cpu-runner.cpp:33:10
    #5 0xffff9b140e0c in __libc_start_main /build/glibc-RIFKjK/glibc-2.31/csu/../csu/libc-start.c:308:16
    #6 0xaaaac71cc414 in _start (/home/david.spickett/build-llvm-aarch64/bin/mlir-cpu-runner+0x1afe414)

SUMMARY: AddressSanitizer: double-free /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-jade-03/final/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:52:3 in __interceptor_free
==3831480==ABORTING
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /home/david.spickett/build-llvm-aarch64/bin/FileCheck /home/david.spickett/llvm-project/mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_pack_libgen.mlir

--

@DavidSpickett , thanks for reporting!

To avoid a full a revert: https://github.com/llvm/llvm-project/commit/51eaee3b42f057fc2cc3fae7ec71d44fa1b82057
Issue reported here: https://github.com/llvm/llvm-project/issues/64727

@Peiming and @aartbik - do you see where the "double free" could be coming from?

-Andrzej

@DavidSpickett , thanks for reporting!

To avoid a full a revert: https://github.com/llvm/llvm-project/commit/51eaee3b42f057fc2cc3fae7ec71d44fa1b82057
Issue reported here: https://github.com/llvm/llvm-project/issues/64727

@Peiming and @aartbik - do you see where the "double free" could be coming from?

-Andrzej

Thanks for reporting! I think you partial revert is the right fix: I forget to change the flag for VLA test and my local test environment can not cover that path either.