Page MenuHomePhabricator

Fixed non-deterministic GPU intrinsic lowering.
Needs RevisionPublic

Authored by dfki-jugr on Jan 30 2020, 7:46 AM.

Details

Summary

The current implementation of the intrinsic lowering phases seemed to be
non-determinsitic accross platforms. Multiple patterns with the same
priority have been matched differently on different platforms (Windows/
Linux) (see https://reviews.llvm.org/D73471). This patch circumvents
this issue by using an adjusted pattern-rewriter benefit to avoid
clashes.

Diff Detail

Unit TestsFailed

TimeTest
650 mslibc++.std/containers/sequences/array/array_creation::to_array.fail.cpp
Command: ['/usr/bin/clang++', '-o', '/dev/null', '-x', 'c++', '/mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/libcxx/test/std/containers/sequences/array/array.creation/to_array.fail.cpp', '-c', '-v', '-ftemplate-depth=270', '-fsyntax-only', '-Xclang', '-verify', '-Xclang', '-verify-ignore-unexpected=note', '-ferror-limit=1024', '-Werror=thread-safety', '-std=c++2a', '-include', '/mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/libcxx/test/support/nasty_macros.h', '-nostdinc++', '-I/mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/libcxx/include', '-I/mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/projects/libcxx/include/c++build', '-D__STDC_FORMAT_MACROS', '-D__STDC_LIMIT_MACROS', '-D__STDC_CONSTANT_MACROS', '-I/mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/libcxx/test/support', '-DLIBCXX_FILESYSTEM_STATIC_TEST_ROOT="/mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/libcxx/test/std/input.output/filesystems/Inputs/static_test_env"', '-DLIBCXX_FILESYSTEM_DYNAMIC_TEST_ROOT="/mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/projects/libcxx/test/filesystem/Output/dynamic_env"', '-DLIBCXX_FILESYSTEM_DYNAMIC_TEST_HELPER="/usr/bin/python /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/libcxx/test/support/filesystem_dynamic_test_helper.py"', '-D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER', '-Wall', '-Wextra', '-Werror', '-Wuser-defined-warnings', '-Wshadow', '-Wno-unused-command-line-argument', '-Wno-attributes', '-Wno-pessimizing-move', '-Wno-c++11-extensions', '-Wno-user-defined-literals', '-Wno-noexcept-type', '-Wsign-compare', '-Wunused-variable', '-Wunused-parameter', '-Wunreachable-code', '-Wno-error=user-defined-warnings', '-c'] Exit Code: 1 Standard Error:

Event Timeline

dfki-jugr created this revision.Jan 30 2020, 7:46 AM

Unit tests: fail. 62343 tests passed, 1 failed and 839 were skipped.

failed: libc++.std/containers/sequences/array/array_creation/to_array.fail.cpp

clang-tidy: pass.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

This doesn't look an ideal solution. The pattern list is supposed to be deterministic. Can you provide more details here? (ideally with repro instructions)

rriddle requested changes to this revision.Jan 30 2020, 9:40 AM
This revision now requires changes to proceed.Jan 30 2020, 9:40 AM
mehdi_amini requested changes to this revision.Jan 30 2020, 11:08 AM
mehdi_amini retitled this revision from Fixed non-deterministic GPU intrisnic lowering. to Fixed non-deterministic GPU intrinsic lowering..

This seems like hiding a bug, not fixing it, I'd really like us not to get down this road.

nicolasvasilache requested changes to this revision.Jan 31 2020, 1:53 PM

Same here, pattern benefits have important use cases. Twiddling them around to make a test pass is not an option.

herhut added a comment.Feb 4 2020, 2:16 AM

Let's use dynamicallyIllegalOp instead to disallow the CPU intrinsic explicitly. That would also make the non-determinism go away by not relying on pattern order anymore. And it also captures the actual issue that the intrinsics call is indeed illegal.

Let's use dynamicallyIllegalOp instead to disallow the CPU intrinsic explicitly. That would also make the non-determinism go away by not relying on pattern order anymore. And it also captures the actual issue that the intrinsics call is indeed illegal.

IIUC: This does not address the underlying issue with the framework though, it seems to still be in the category of "hiding the bug".

herhut added a comment.Feb 5 2020, 1:39 AM

Let's use dynamicallyIllegalOp instead to disallow the CPU intrinsic explicitly. That would also make the non-determinism go away by not relying on pattern order anymore. And it also captures the actual issue that the intrinsics call is indeed illegal.

IIUC: This does not address the underlying issue with the framework though, it seems to still be in the category of "hiding the bug".

No and no. It will not fix the underlying non-determinism issue, so that will have to be addressed. Independently, we should not rely on pattern insertion order for correctness either. So we should exclude the rewriting to LLVM intrinsics that are not available.

So let's get this correct and also fix the underlying issue.

If this is going to take more than a day or two to fix at this point, I think you should at least XFAIL the tests that are failing on Windows because any runs that include MLIR right now are red and will stay so until your fix lands. You can un-XFAIL them once you verify that they pass with the fix.

@mehdi_amini I fully get your point and I totally agree that we should not hide underlying bugs. However, you can argue as soon as two pattern matchers have the same benefit their ordering will be undefinied. In this case, the only possible solution you have, is to change the pattern matcher benefit.
@herhut We follow your arguments that we should unbreak the build on Windows asap. As mentioned above, this does not solve the general non-determinism issue. However, even if the ordering would be deterministic, we should not rely on intrinsic magic that might be easily broken by another commit in the future.

@rriddle I've tried to come up with a minimal example including a new dialect. However, the issue does not occur in simple cases on our testing machines. The easiest way to dive into this issue would be to move line 728 in LowerGPUOpsToNNVMOps.cc to 750:

// move this line
// populateWithGenerated(converter.getDialect()->getContext(), &patterns);
patterns
    .insert<GPUIndexIntrinsicOpLowering<gpu::ThreadIdOp, NVVM::ThreadIdXOp,
                                        NVVM::ThreadIdYOp, NVVM::ThreadIdZOp>,
            GPUIndexIntrinsicOpLowering<gpu::BlockDimOp, NVVM::BlockDimXOp,
                                        NVVM::BlockDimYOp, NVVM::BlockDimZOp>,
            GPUIndexIntrinsicOpLowering<gpu::BlockIdOp, NVVM::BlockIdXOp,
                                        NVVM::BlockIdYOp, NVVM::BlockIdZOp>,
            GPUIndexIntrinsicOpLowering<gpu::GridDimOp, NVVM::GridDimXOp,
                                        NVVM::GridDimYOp, NVVM::GridDimZOp>,
            GPUAllReduceOpLowering, GPUShuffleOpLowering, GPUFuncOpLowering,
            GPUReturnOpLowering>(converter);
patterns.insert<OpToFuncCallLowering<AbsFOp>>(converter, "__nv_fabsf",
                                             "__nv_fabs");
patterns.insert<OpToFuncCallLowering<CeilFOp>>(converter, "__nv_ceilf",
                                             "__nv_ceil");
patterns.insert<OpToFuncCallLowering<CosOp>>(converter, "__nv_cosf",
                                             "__nv_cos");
patterns.insert<OpToFuncCallLowering<ExpOp>>(converter, "__nv_expf",
                                             "__nv_exp");
patterns.insert<OpToFuncCallLowering<TanhOp>>(converter, "__nv_tanhf",
                                              "__nv_tanh");
// to here
populateWithGenerated(converter.getDialect()->getContext(), &patterns);