Page MenuHomePhabricator
Feed Advanced Search

Wed, Oct 21

ye-luo added a comment to D89844: [Clang][OpenMP] Fixed an issue of segment fault when using target nowait.

Getting this even when compiling without offload. You can use the reproducer from the original bug report.

clang++: /home/yeluo/opt/llvm-clang/llvm-project/llvm/include/llvm/ADT/APInt.h:1151: bool llvm::APInt::operator==(const llvm::APInt &) const: Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /home/packages/llvm/master-patched/bin/clang++ -DADD_ -DH5_USE_16_API -DHAVE_CONFIG_H -Drestrict=__restrict__ -I/home/yeluo/opt/miniqmc/src -I/home/yeluo/opt/miniqmc/build_clang_offlaod_nowait/src -fopenmp -fomit-frame-pointer -fstrict-aliasing -D__forceinline=inline -march=native -O3 -DNDEBUG -ffast-math -std=c++11 -o CMakeFiles/qmcutil.dir/Utilities/tinyxml/tinyxml2.cpp.o -c /home/yeluo/opt/miniqmc/src/Utilities/tinyxml/tinyxml2.cpp 
1.	<eof> parser at end of file
2.	Per-module optimization passes
3.	Running pass 'CallGraph Pass Manager' on module '/home/yeluo/opt/miniqmc/src/Utilities/tinyxml/tinyxml2.cpp'.
4.	Running pass 'Combine redundant instructions' on function '@_ZN8tinyxml27XMLUtil10IsNameCharEh'
 #0 0x0000000001ecc523 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/packages/llvm/master-patched/bin/clang+++0x1ecc523)
 #1 0x0000000001eca25e llvm::sys::RunSignalHandlers() (/home/packages/llvm/master-patched/bin/clang+++0x1eca25e)
 #2 0x0000000001ecb8cd llvm::sys::CleanupOnSignal(unsigned long) (/home/packages/llvm/master-patched/bin/clang+++0x1ecb8cd)
 #3 0x0000000001e513b3 (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) (/home/packages/llvm/master-patched/bin/clang+++0x1e513b3)
 #4 0x0000000001e514ee CrashRecoverySignalHandler(int) (/home/packages/llvm/master-patched/bin/clang+++0x1e514ee)
 #5 0x00007f18f56923c0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x153c0)
 #6 0x00007f18f512718b raise /build/glibc-ZN95T4/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:51:1
 #7 0x00007f18f5106859 abort /build/glibc-ZN95T4/glibc-2.31/stdlib/abort.c:81:7
 #8 0x00007f18f5106729 get_sysdep_segment_value /build/glibc-ZN95T4/glibc-2.31/intl/loadmsgcat.c:509:8
 #9 0x00007f18f5106729 _nl_load_domain /build/glibc-ZN95T4/glibc-2.31/intl/loadmsgcat.c:970:34
#10 0x00007f18f5117f36 (/lib/x86_64-linux-gnu/libc.so.6+0x36f36)
#11 0x00000000019f4c00 llvm::InstCombinerImpl::foldOrOfICmps(llvm::ICmpInst*, llvm::ICmpInst*, llvm::BinaryOperator&) (/home/packages/llvm/master-patched/bin/clang+++0x19f4c00)
#12 0x00000000019fb023 llvm::InstCombinerImpl::visitOr(llvm::BinaryOperator&) (/home/packages/llvm/master-patched/bin/clang+++0x19fb023)
#13 0x00000000019d354c llvm::InstCombinerImpl::run() (/home/packages/llvm/master-patched/bin/clang+++0x19d354c)
#14 0x00000000019d5788 combineInstructionsOverFunction(llvm::Function&, llvm::InstCombineWorklist&, llvm::AAResults*, llvm::AssumptionCache&, llvm::TargetLibraryInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::OptimizationRemarkEmitter&, llvm::BlockFrequencyInfo*, llvm::ProfileSummaryInfo*, unsigned int, llvm::LoopInfo*) (/home/packages/llvm/master-patched/bin/clang+++0x19d5788)
#15 0x00000000019d70b1 llvm::InstructionCombiningPass::runOnFunction(llvm::Function&) (/home/packages/llvm/master-patched/bin/clang+++0x19d70b1)
#16 0x00000000017c7a68 llvm::FPPassManager::runOnFunction(llvm::Function&) (/home/packages/llvm/master-patched/bin/clang+++0x17c7a68)
#17 0x00000000010d0033 (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&) (/home/packages/llvm/master-patched/bin/clang+++0x10d0033)
#18 0x00000000017c8117 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/home/packages/llvm/master-patched/bin/clang+++0x17c8117)
#19 0x00000000020fed4a clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) (/home/packages/llvm/master-patched/bin/clang+++0x20fed4a)
#20 0x0000000002d29c9c clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/home/packages/llvm/master-patched/bin/clang+++0x2d29c9c)
#21 0x00000000037e77e3 clang::ParseAST(clang::Sema&, bool, bool) (/home/packages/llvm/master-patched/bin/clang+++0x37e77e3)
#22 0x00000000026dc383 clang::FrontendAction::Execute() (/home/packages/llvm/master-patched/bin/clang+++0x26dc383)
#23 0x000000000266e4f2 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/home/packages/llvm/master-patched/bin/clang+++0x266e4f2)
#24 0x0000000002789bb2 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/home/packages/llvm/master-patched/bin/clang+++0x2789bb2)
#25 0x0000000000a4568c cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/home/packages/llvm/master-patched/bin/clang+++0xa4568c)
#26 0x0000000000a437ec ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) (/home/packages/llvm/master-patched/bin/clang+++0xa437ec)
#27 0x0000000002523de2 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, bool*) const::$_1>(long) (/home/packages/llvm/master-patched/bin/clang+++0x2523de2)
#28 0x0000000001e512c7 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/home/packages/llvm/master-patched/bin/clang+++0x1e512c7)
#29 0x00000000025234f7 clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, bool*) const (/home/packages/llvm/master-patched/bin/clang+++0x25234f7)
#30 0x00000000024efd28 clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&) const (/home/packages/llvm/master-patched/bin/clang+++0x24efd28)
#31 0x00000000024f0247 clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*> >&) const (/home/packages/llvm/master-patched/bin/clang+++0x24f0247)
#32 0x0000000002509758 clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*> >&) (/home/packages/llvm/master-patched/bin/clang+++0x2509758)
#33 0x0000000000a43158 main (/home/packages/llvm/master-patched/bin/clang+++0xa43158)
#34 0x00007f18f51080b3 __libc_start_main /build/glibc-ZN95T4/glibc-2.31/csu/../csu/libc-start.c:342:3
#35 0x0000000000a404de _start (/home/packages/llvm/master-patched/bin/clang+++0xa404de)
clang-12: error: clang frontend command failed with exit code 134 (use -v to see invocation)
clang version 12.0.0 (https://github.com/llvm/llvm-project.git ca73dcd8a9ed9cc3ca1c1cc97ab893747791a681)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/packages/llvm/master-patched/bin
clang-12: note: diagnostic msg: 
********************
Wed, Oct 21, 11:40 AM · Restricted Project
ye-luo added a comment to D89041: [libc++] Include <__config_site> from <__config>.

Fails at make install.

CMake Error at projects/libcxx/include/cmake_install.cmake:753 (file):

file INSTALL cannot find
"/scratch/opt/llvm-clang/build_mirror_offload_nightly/projects/libcxx/__config_site":
No such file or directory.

This change is surprisingly tricky to land. Thanks for the heads up.

Should be fixed in

commit b5aa67446e01bd277727b05710a42e69ac41e74b
Author: Louis Dionne <ldionne@apple.com>
Date:   Wed Oct 21 12:53:24 2020 -0400

    [libc++] Fix the installation of libc++ headers since the __config_site change

I'm still getting build errors, from an existing build that worked earlier today, links against libcxx installed previously.

/home2/3n4/clang/bin/../include/c++/v1/__config:13:10: fatal error: '__config_site' file not found
#include <__config_site>

When I try a fresh build using an install script,

In file included from /home2/3n4/llvm/trunk/llvm-project/compiler-rt/lib/fuzzer/FuzzerCrossOver.cpp:11:
/home2/3n4/llvm/trunk/llvm-project/compiler-rt/lib/fuzzer/FuzzerDefs.h:14:10: fatal error: cassert: No such file or directory
 #include <cassert>
Wed, Oct 21, 10:40 AM · Restricted Project, Restricted Project
ye-luo added a comment to D89041: [libc++] Include <__config_site> from <__config>.

Fails at make install.

Wed, Oct 21, 9:44 AM · Restricted Project, Restricted Project

Fri, Oct 16

ye-luo added a comment to D77609: [OpenMP] Added the support for unshackled task in RTL.

Enabled unshackled thread by default

Fri, Oct 16, 4:31 PM · Restricted Project

Wed, Oct 7

ye-luo accepted D88929: [OpenMP] Change CMake Configuration to Build for Highest CUDA Architecture by Default.

LGTM

Wed, Oct 7, 7:55 PM · Restricted Project, Restricted Project

Tue, Oct 6

ye-luo added inline comments to D88929: [OpenMP] Change CMake Configuration to Build for Highest CUDA Architecture by Default.
Tue, Oct 6, 10:13 PM · Restricted Project, Restricted Project
ye-luo added inline comments to D88929: [OpenMP] Change CMake Configuration to Build for Highest CUDA Architecture by Default.
Tue, Oct 6, 9:40 PM · Restricted Project, Restricted Project
ye-luo requested changes to D88929: [OpenMP] Change CMake Configuration to Build for Highest CUDA Architecture by Default.
Tue, Oct 6, 8:01 PM · Restricted Project, Restricted Project
ye-luo added a comment to D88929: [OpenMP] Change CMake Configuration to Build for Highest CUDA Architecture by Default.

An alternative approach is to build the deviceRTL for multiple cuda versions and then pick whichever one is the best fit when compiling application code. That has advantages when building the deviceRTL libraries on a different machine to the one that intends to use it.

Cmake isn't my thing, but I see that my trunk build only has libomptarget-nvptx-sm_35.bc when the local card is a sm_50. The downstream amd toolchain builds lots of this library, my install dir has fifteen of them (including sm_50).

Tue, Oct 6, 5:07 PM · Restricted Project, Restricted Project
ye-luo added a comment to D88929: [OpenMP] Change CMake Configuration to Build for Highest CUDA Architecture by Default.

Probably not messing with enable_language(CUDA) at the moment, just add cuda_select_nvcc_arch_flags(CUDA_ARCH_FLAGS) to `openmp/libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake?

That only controls loading the library, since this is where we set all the CUDA options I think it's fine to call it here.

Tue, Oct 6, 4:09 PM · Restricted Project, Restricted Project
ye-luo added a comment to D88929: [OpenMP] Change CMake Configuration to Build for Highest CUDA Architecture by Default.

I just realized that this patch affects clang and libomptarget.
I cannot comment on clang. Regarding libomptarget, Could you explain why the detection is not put together with other cuda stuff in openmp/libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake

If we're sticking with using FindCUDA it's definitely redundant here since it was already called by the time we get here. The support for CUDA language would use the same method but have enable_language(CUDA) somewhere instead of find_package(CUDA)

Tue, Oct 6, 3:54 PM · Restricted Project, Restricted Project
ye-luo added a comment to D88929: [OpenMP] Change CMake Configuration to Build for Highest CUDA Architecture by Default.

I just realized that this patch affects clang and libomptarget.
I cannot comment on clang. Regarding libomptarget, Could you explain why the detection is not put together with other cuda stuff in openmp/libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake

Tue, Oct 6, 3:36 PM · Restricted Project, Restricted Project
ye-luo added a comment to D88929: [OpenMP] Change CMake Configuration to Build for Highest CUDA Architecture by Default.

3.18 introduces CMAKE_CUDA_ARCHITECTURES. Does 3.18 supports detection? If we know a new way works since 3.18, I think putting both with if-else makes sense.

Tue, Oct 6, 3:13 PM · Restricted Project, Restricted Project
ye-luo added a comment to D88929: [OpenMP] Change CMake Configuration to Build for Highest CUDA Architecture by Default.

The link I posted indicated that independent feature is merged since 3.12. Better to avoid deprecated stuff when introducing new cmake lines even though some existing lines may rely on deprecated cmake.

Tue, Oct 6, 3:04 PM · Restricted Project, Restricted Project
ye-luo added a comment to D88929: [OpenMP] Change CMake Configuration to Build for Highest CUDA Architecture by Default.

FindCUDA has been deprecated.
Please explore the following feature without directly calling FindCUDA.
https://gitlab.kitware.com/cmake/cmake/-/merge_requests/1856

Tue, Oct 6, 2:50 PM · Restricted Project, Restricted Project

Mon, Sep 28

ye-luo added a comment to D88384: [OpenMP][FIX] Verify compatible types for declare variant calls.

The minimal reproducer and full app work now.

Mon, Sep 28, 7:04 AM · Restricted Project

Sep 23 2020

ye-luo added inline comments to D88185: [OpenMP] cmake option LIBOMPTARGET_NVPTX_MAX_SM for nvptx device RTL .
Sep 23 2020, 6:08 PM · Restricted Project
ye-luo added inline comments to D88185: [OpenMP] cmake option LIBOMPTARGET_NVPTX_MAX_SM for nvptx device RTL .
Sep 23 2020, 5:11 PM · Restricted Project
ye-luo requested review of D88185: [OpenMP] cmake option LIBOMPTARGET_NVPTX_MAX_SM for nvptx device RTL .
Sep 23 2020, 3:14 PM · Restricted Project

Sep 21 2020

ye-luo updated the diff for D87980: [OpenMP] Protect unrecogonized CUDA error code.

Should be good to go now.

Sep 21 2020, 9:47 AM · Restricted Project
ye-luo updated the diff for D87980: [OpenMP] Protect unrecogonized CUDA error code.

After a bit more experiment, the return status of cuGetErrorString can be more than CUDA_SUCCESS, CUDA_ERROR_INVALID_VALUE.
In this particular case when the CUDA is deinitialized, the error code cannot be translated by cuGetErrorString any more.
So now only print errStr with CUDA_SUCCESS.
Treat CUDA_ERROR_INVALID_VALUE different from generic !=CUDA_SUCCESS

Sep 21 2020, 9:33 AM · Restricted Project
ye-luo added a comment to D87980: [OpenMP] Protect unrecogonized CUDA error code.

Hold a second. I'm exploring a bit more in the error message.

Sep 21 2020, 9:18 AM · Restricted Project
ye-luo added a comment to D87980: [OpenMP] Protect unrecogonized CUDA error code.

The root cause is a known issue and I put up a bug report to track the status.
https://bugs.llvm.org/show_bug.cgi?id=47595
Anyway, this patch should be sufficient for users at the moment.

Sep 21 2020, 8:07 AM · Restricted Project

Sep 19 2020

ye-luo updated the diff for D87980: [OpenMP] Protect unrecogonized CUDA error code.
Sep 19 2020, 8:44 PM · Restricted Project
ye-luo requested review of D87980: [OpenMP] Protect unrecogonized CUDA error code.
Sep 19 2020, 8:31 PM · Restricted Project

Sep 14 2020

ye-luo added a comment to D78075: [Clang][OpenMP] Added support for nowait target in CodeGen via regular task.

However, OpenMP task has a problem that it must be within
to a parallel region; otherwise the task will be executed immediately. As a
result, if we directly wrap to a regular task, the nowait target outside of a
parallel region is still a synchronous version.

The spec says an implicit task can be generated by an implicit parallel region which can be the whole OpenMP program. For this reason, the need of explicit parallel region is a limitation of the llvm OpenMP runtime, right?

Can I have an option to run the nowait region as a regular task instead of an unshackled task? So I can use "parallel" and well established ways to control the thread affinity.

According to the spec, an implicit parallel region is an inactive parallel region that is not generated from a parallel construct. And based on the definition of active parallel region, which is a parallel region that is executed by a team consisting of more than one thread, an inactive parallel region only has one thread. Since we only have one thread, if we encounter a task, executing it immediately does make sense as we don't have another thread to execute it.

If I remember correctly, you may yield the thread inside a target region after enqueuing kernels and transfers. So even with 1 thread, there is chance to run other tasks without finishing this target. Isn't that possible?

Sep 14 2020, 6:11 PM · Restricted Project
ye-luo added a comment to D78075: [Clang][OpenMP] Added support for nowait target in CodeGen via regular task.

However, OpenMP task has a problem that it must be within
to a parallel region; otherwise the task will be executed immediately. As a
result, if we directly wrap to a regular task, the nowait target outside of a
parallel region is still a synchronous version.

Sep 14 2020, 3:51 PM · Restricted Project

Sep 10 2020

ye-luo added inline comments to D87165: [OpenMP] Begin Printing Information Dumps In Libomptarget and Plugins.
Sep 10 2020, 6:23 AM · Restricted Project

Sep 8 2020

ye-luo accepted D87165: [OpenMP] Begin Printing Information Dumps In Libomptarget and Plugins.

The changes I requested as been added. Remove my blocking. Still need other reviews to be addressed.

Sep 8 2020, 8:27 AM · Restricted Project

Sep 4 2020

ye-luo added a comment to D87165: [OpenMP] Begin Printing Information Dumps In Libomptarget and Plugins.

Added additional comments. Should I add them to the doxygen notes at the top?

Sep 4 2020, 2:58 PM · Restricted Project
ye-luo requested changes to D87165: [OpenMP] Begin Printing Information Dumps In Libomptarget and Plugins.
Sep 4 2020, 2:32 PM · Restricted Project

Sep 1 2020

ye-luo added inline comments to D77609: [OpenMP] Added the support for unshackled task in RTL.
Sep 1 2020, 2:00 PM · Restricted Project

Aug 31 2020

ye-luo added a comment to D86804: [OpenMP] Consolidate error handling and debug messages in Libomptarget.

It seems that functions are marked static so they should be OK. However, including the whole Debug.h in a plugin cpp makes it feel OK to use any function/macro from the header file. But actually only part of the macros are for the plugin. some are only for the libomptarget.

I'm not sure we want to make a distinction, the point is to move to a unified debug/message model. You can choose not only the level of information but also the kind of output (text, json, ...). The messages will then be tied to the webpage via enums, that allows all plugins to emit the same message for the same thing with the same link to more information. There will certainly things that are only used in libomptarget or the plugins, but I don't see how that is any worse than duplicating the parts that are used by both.

I didn't mean to duplicate anything. Instead you need multiple header files. One for common parts, one for libomptarget and one for plugins. The latter two both include the first one. Later you expand OFFLOAD_XXX signals, they can be added to the common file. The return signal is generated by the plugins and captured by the libomptarget. Some users may want to see only the messages captured universally by libomptarget. Some users still want to see the native error message. So the libomptarget and plugin side error handling still needs to be separated.

I fail to see why this machinery is necessary to emit only messages from one place and not the other. I am not against a hierarchy of headers per se, but right now, and maybe also later, there seems to be little point.
I mean, we need to introduce a new env variable, actually two, that allow separate control. Once we have that we can argue about separation.
Alternatively, I would have suggested to define the "location" prior to including the debug header, e.g.,:

#define DEBUG_LOCATION "PLUGIN"
#include "Debug.h"

which we verify like:

#if DEBUG_LOCATION != "PLUGIN" and DEBUG_LOCATION != "OMPTARGET"
#error ...
#endif

At the end of the day I want to simplify things. A single location for all our debug needs sounds simpler than 1 + #plugins to me, even if we don't use all functionality at each location. If separation does not allow anything we cannot reasonably do in a single location, I doubt it provides a benefit.

Aug 31 2020, 3:22 PM · Restricted Project
ye-luo added a comment to D86804: [OpenMP] Consolidate error handling and debug messages in Libomptarget.
Aug 31 2020, 3:05 PM · Restricted Project
ye-luo added a comment to D86804: [OpenMP] Consolidate error handling and debug messages in Libomptarget.

It seems that functions are marked static so they should be OK. However, including the whole Debug.h in a plugin cpp makes it feel OK to use any function/macro from the header file. But actually only part of the macros are for the plugin. some are only for the libomptarget.

I'm not sure we want to make a distinction, the point is to move to a unified debug/message model. You can choose not only the level of information but also the kind of output (text, json, ...). The messages will then be tied to the webpage via enums, that allows all plugins to emit the same message for the same thing with the same link to more information. There will certainly things that are only used in libomptarget or the plugins, but I don't see how that is any worse than duplicating the parts that are used by both.

Aug 31 2020, 2:30 PM · Restricted Project
ye-luo added a comment to D86804: [OpenMP] Consolidate error handling and debug messages in Libomptarget.

It seems that functions are marked static so they should be OK. However, including the whole Debug.h in a plugin cpp makes it feel OK to use any function/macro from the header file. But actually only part of the macros are for the plugin. some are only for the libomptarget.

Aug 31 2020, 10:10 AM · Restricted Project
ye-luo added a comment to D86804: [OpenMP] Consolidate error handling and debug messages in Libomptarget.

I don't feel right having Debug.h shared by libomptarget and plugins especially when Debug.h is not just macro but also functions.

Aug 31 2020, 9:41 AM · Restricted Project

Aug 26 2020

ye-luo added a comment to D86483: [OpenMP] Always emit debug messages that indicate offloading failure.

Please document the flags in the patch summary.

Aug 26 2020, 9:48 AM · Restricted Project

Aug 24 2020

ye-luo accepted D86307: [OpenMP] Pack first-private arguments to improve efficiency of data transfer.

I prefer to PrivateArgumentManagerTy moved into its own files.
The rest looks good to me.

Aug 24 2020, 2:26 PM · Restricted Project
ye-luo added inline comments to D86307: [OpenMP] Pack first-private arguments to improve efficiency of data transfer.
Aug 24 2020, 10:44 AM · Restricted Project

Aug 23 2020

ye-luo added a comment to D86307: [OpenMP] Pack first-private arguments to improve efficiency of data transfer.

Down the road, we may need a way to allocate host pinned memory via the plugin for the host buffer to maximize transfer performance.

Aug 23 2020, 9:41 PM · Restricted Project

Aug 20 2020

ye-luo added a comment to D81054: [OpenMP] Introduce target memory manager.

As a heads up, I'm told this breaks amdgpu tests. @ronlieb is looking at the merge from upstream, don't have any more details at this time. The basic idea of wrapping device alloc seems likely to be sound for all targets so I'd guess we've run into a bug in this patch.

Aug 20 2020, 6:33 PM · Restricted Project
ye-luo added a comment to D86307: [OpenMP] Pack first-private arguments to improve efficiency of data transfer.

Only minor things.

Aug 20 2020, 3:58 PM · Restricted Project
ye-luo added inline comments to D86307: [OpenMP] Pack first-private arguments to improve efficiency of data transfer.
Aug 20 2020, 3:18 PM · Restricted Project
ye-luo added inline comments to D86307: [OpenMP] Pack first-private arguments to improve efficiency of data transfer.
Aug 20 2020, 1:51 PM · Restricted Project
ye-luo added inline comments to D86307: [OpenMP] Pack first-private arguments to improve efficiency of data transfer.
Aug 20 2020, 12:47 PM · Restricted Project
ye-luo added a comment to D86307: [OpenMP] Pack first-private arguments to improve efficiency of data transfer.

Why just "small" ones? why not all of them?

In addition to the last paragraph of the new commit message, we also have to copy the data on the host in the right place. That is not free as the size grows.

Aug 20 2020, 12:12 PM · Restricted Project
ye-luo added a comment to D86307: [OpenMP] Pack first-private arguments to improve efficiency of data transfer.

Why just "small" ones? why not all of them?

Aug 20 2020, 11:35 AM · Restricted Project

Aug 19 2020

ye-luo accepted D81054: [OpenMP] Introduce target memory manager.

LGTM

Aug 19 2020, 1:18 PM · Restricted Project
ye-luo added inline comments to D81054: [OpenMP] Introduce target memory manager.
Aug 19 2020, 12:34 PM · Restricted Project
ye-luo accepted D86238: [OpenMP] Refactored the function `DeviceTy::data_exchange`.

LGTM

Aug 19 2020, 12:03 PM · Restricted Project

Aug 18 2020

ye-luo added inline comments to D81054: [OpenMP] Introduce target memory manager.
Aug 18 2020, 8:47 PM · Restricted Project
ye-luo added inline comments to D81054: [OpenMP] Introduce target memory manager.
Aug 18 2020, 8:27 PM · Restricted Project
ye-luo added inline comments to D81054: [OpenMP] Introduce target memory manager.
Aug 18 2020, 8:01 PM · Restricted Project
ye-luo added a comment to D81054: [OpenMP] Introduce target memory manager.

In addition,

  1. the DeviceTy copy constructor and assign operator are imperfect before this patch. I don't think we can fix them in this patch. We should just document the imperfection here.
  2. Because the memory limit is per allocation, it seems that the MemoryManager can still hold infinite amount of memory and we don't have way to free them. I'm concerned about having this feature on by default.
Aug 18 2020, 6:57 PM · Restricted Project

Aug 13 2020

ye-luo added a comment to D84470: [OpenMP 5.0] Fix user-defined mapper privatization in tasks.

What is the current status of this patch?
@lildmh could you update this patch? I'd like to test it against
https://bugs.llvm.org/show_bug.cgi?id=47122

Aug 13 2020, 3:00 PM · Restricted Project

Aug 12 2020

ye-luo added inline comments to D81054: [OpenMP] Introduce target memory manager.
Aug 12 2020, 6:23 PM · Restricted Project
ye-luo added inline comments to D81054: [OpenMP] Introduce target memory manager.
Aug 12 2020, 5:31 PM · Restricted Project
ye-luo added inline comments to D81054: [OpenMP] Introduce target memory manager.
Aug 12 2020, 5:09 PM · Restricted Project
ye-luo added inline comments to D81054: [OpenMP] Introduce target memory manager.
Aug 12 2020, 4:54 PM · Restricted Project
ye-luo added inline comments to D81054: [OpenMP] Introduce target memory manager.
Aug 12 2020, 4:49 PM · Restricted Project
ye-luo added inline comments to D81054: [OpenMP] Introduce target memory manager.
Aug 12 2020, 10:22 AM · Restricted Project
ye-luo requested changes to D81054: [OpenMP] Introduce target memory manager.

Block the patch temporarily for my earlier questions.

Aug 12 2020, 10:11 AM · Restricted Project
ye-luo added a comment to D81054: [OpenMP] Introduce target memory manager.
  1. Please mention LIBOMPTARGET_MEMORY_MANAGER_THRESHOLD, default value and unit in the patch summary.
  2. Is it possible to have a unit test testing the manager class behaviors?
  3. Can we offload to host and run address sanitizer or valgrind?

I'm not sure if I'm asking for too much here.

Aug 12 2020, 10:05 AM · Restricted Project

Jul 31 2020

ye-luo accepted D84996: [OpenMP] Fixed the issue that target memory deallocation might be called when they're being used.

LGTM

Jul 31 2020, 2:06 PM · Restricted Project

Jul 30 2020

ye-luo added a comment to D84996: [OpenMP] Fixed the issue that target memory deallocation might be called when they're being used.

Thanks for fixing the bug. It should be good for the moment.
When I think about the existence of recursive mapper, we may still have more sync than needed. I think recursion the whole targetDataBegin/targetDataEnd is convenient but sub-optimal choice.
Recursion should only be done on the map/mapper analysis. Just leave my thoughts here. It needs a discussion beyond this patch.

Jul 30 2020, 7:02 PM · Restricted Project
ye-luo accepted D84816: [OpenMP] Refactored the function `target`.

LGTM.

Jul 30 2020, 5:06 PM · Restricted Project
ye-luo accepted D84991: [OpenMP] Refactored the function `targetDataEnd`.

LGTM. Please mention renaming variables in the summary.

Jul 30 2020, 5:04 PM · Restricted Project

Jul 29 2020

ye-luo accepted D84767: [OPENMP]Fix PR46824: Global declare target pointer cannot be accessed in target region..

LGTM. My applications run as expected now. PR46824, PR46012, PR46868 all work fine.

Jul 29 2020, 2:09 PM · Restricted Project, Restricted Project
ye-luo added a comment to D84816: [OpenMP] Refactored the function `target`.

Only minor documentation issues.

Jul 29 2020, 9:32 AM · Restricted Project

Jul 28 2020

ye-luo added a comment to D84767: [OPENMP]Fix PR46824: Global declare target pointer cannot be accessed in target region..

This patch
GPU activities: 96.99% 350.05ms 10 35.005ms 1.5680us 350.00ms [CUDA memcpy HtoD]
before the July21 change
GPU activities: 95.33% 20.317ms 4 5.0793ms 1.6000us 20.305ms [CUDA memcpy HtoD]
Still more transfer than it should.

Jul 28 2020, 4:13 PM · Restricted Project, Restricted Project
ye-luo added a comment to D84767: [OPENMP]Fix PR46824: Global declare target pointer cannot be accessed in target region..

This patch
GPU activities: 96.99% 350.05ms 10 35.005ms 1.5680us 350.00ms [CUDA memcpy HtoD]
before the July21 change
GPU activities: 95.33% 20.317ms 4 5.0793ms 1.6000us 20.305ms [CUDA memcpy HtoD]
Still more transfer than it should.

Jul 28 2020, 3:12 PM · Restricted Project, Restricted Project
ye-luo accepted D84799: [OpenMP] Replaced mutex lock/unlock in `target` with `std::lock_guard`.

LGTM

Jul 28 2020, 2:16 PM · Restricted Project
ye-luo accepted D84797: [NFC][OpenMP] Renamed all variable and function names in `target` to conform with LLVM code standard.

OK. Leave the unrelated renaming to the future.

Jul 28 2020, 2:15 PM · Restricted Project
ye-luo added a comment to D84799: [OpenMP] Replaced mutex lock/unlock in `target` with `std::lock_guard`.

Only one minor issue. Your initial sophisticated patch made my thought you replaced all the lock/unlock. After splitting, the change becomes very clean.

Jul 28 2020, 2:04 PM · Restricted Project
ye-luo added a comment to D84797: [NFC][OpenMP] Renamed all variable and function names in `target` to conform with LLVM code standard.

Should be easy to address my comments and let us get this merged ASAP.

Jul 28 2020, 1:56 PM · Restricted Project
ye-luo added a comment to D84778: [OpenMP] Refactor the `target` function.

I don't think it deserves three patches. The goal is to refactor the target function, and this patch just did this only thing. According to the bi-weekly meeting, the renaming could be with other related changes.

In addtion, should we update target_data_update as well?

I didn't touch target_data_update. Basically I only take care of related code.

Jul 28 2020, 12:57 PM · Restricted Project
ye-luo requested changes to D84767: [OPENMP]Fix PR46824: Global declare target pointer cannot be accessed in target region..

Please check the reproducer in https://bugs.llvm.org/show_bug.cgi?id=46868 with LIBOMPTARGET_DEBUG=1.
The reference counting on the base pointer variable has side effects. It was not cleaned up when these variables leave its scope.

Jul 28 2020, 12:42 PM · Restricted Project, Restricted Project
ye-luo requested changes to D84778: [OpenMP] Refactor the `target` function.

Needs to split this patch into three.

  1. function renaming. In addtion, should we update target_data_update as well?
  2. std::lock_guard change.
  3. "target" change.

The order of 1 and 2 can be flexible

Jul 28 2020, 11:30 AM · Restricted Project
ye-luo added a comment to D84182: [OPENMP]Fix PR46012: declare target pointer cannot be accessed in target region..

@ABataev:

After this patch was committed, I tried to run the following example:

#include <stdio.h>

int *yptr;

int main() {
  int y[10];
  y[1] = 1;
  yptr = &y[0];

  printf("&yptr = %p\n", &yptr);
  printf("&y[0] = %p\n", &y[0]);

  #pragma omp target data map(to: yptr[0:5])
  #pragma omp target
  {
    printf("y = %d\n", yptr[1]);
    yptr[1] = 10;
    printf("y = %d\n", yptr[1]);
  }

  printf("y = %d\n", yptr[1]);
  return 0;
}

The arguments clang generates are:

1) base = &y[0], begin = &yptr, size = 8, type = TARGET_PARAM | TO
2) base = &yptr, begin = &y[0], size = 8, type = PTR_AND_OBJ | TO

The second argument is correct, the first argument doesn't make much sense. I believe it should have its base set to &yptr, not &y[0].
y[0] is not the base for anything, it's only the pointee object.

Jul 28 2020, 6:33 AM · Restricted Project

Jul 24 2020

ye-luo added inline comments to D84487: [OpenMP] Add more pass-through functions in DeviceTy.
Jul 24 2020, 3:37 PM · Restricted Project
ye-luo updated the diff for D84487: [OpenMP] Add more pass-through functions in DeviceTy.
Jul 24 2020, 3:36 PM · Restricted Project
ye-luo added a comment to D82074: [OpenMP] Set cmake policies CMP0074 and CMP0075 to NEW.

The cmake version was bumped to 3.14

Jul 24 2020, 2:09 PM · Restricted Project
ye-luo added inline comments to D84487: [OpenMP] Add more pass-through functions in DeviceTy.
Jul 24 2020, 2:06 PM · Restricted Project
ye-luo updated the diff for D84487: [OpenMP] Add more pass-through functions in DeviceTy.
Jul 24 2020, 2:03 PM · Restricted Project
ye-luo added inline comments to D84487: [OpenMP] Add more pass-through functions in DeviceTy.
Jul 24 2020, 12:34 PM · Restricted Project
ye-luo added inline comments to D84487: [OpenMP] Add more pass-through functions in DeviceTy.
Jul 24 2020, 10:32 AM · Restricted Project
ye-luo added inline comments to D84487: [OpenMP] Add more pass-through functions in DeviceTy.
Jul 24 2020, 9:45 AM · Restricted Project
ye-luo added inline comments to D84487: [OpenMP] Add more pass-through functions in DeviceTy.
Jul 24 2020, 8:51 AM · Restricted Project
ye-luo updated the diff for D84487: [OpenMP] Add more pass-through functions in DeviceTy.
Jul 24 2020, 8:19 AM · Restricted Project
ye-luo added inline comments to D84487: [OpenMP] Add more pass-through functions in DeviceTy.
Jul 24 2020, 7:28 AM · Restricted Project
ye-luo added inline comments to D84487: [OpenMP] Add more pass-through functions in DeviceTy.
Jul 24 2020, 6:57 AM · Restricted Project

Jul 23 2020

ye-luo created D84487: [OpenMP] Add more pass-through functions in DeviceTy.
Jul 23 2020, 11:02 PM · Restricted Project

Jul 22 2020

ye-luo accepted D84381: [OpenMP] Wait for kernel prior to memory deallocation.
Jul 22 2020, 9:48 PM · Restricted Project
ye-luo added a comment to D84381: [OpenMP] Wait for kernel prior to memory deallocation.

OK. It is less broken now.
target_data_end still does Device.deallocTgtPtr and needs a sync before it.
To fully fix this issue, target_data_end must be spitted.

Jul 22 2020, 9:46 PM · Restricted Project
ye-luo added a comment to D84381: [OpenMP] Wait for kernel prior to memory deallocation.

Indeed, target_data_begin should be split as well. cudaMalloc blocks the whole device. Alternating cudaMalloc and transfer only makes the whole process further slower. Better to make all the allocation and then start queuing the transfer.

Jul 22 2020, 8:12 PM · Restricted Project
ye-luo added a comment to D84381: [OpenMP] Wait for kernel prior to memory deallocation.

Does it mean the D2H will always run synchronously after this change?
Does it also mean that target_data_end should be split into data transfer and data free parts?

Jul 22 2020, 7:59 PM · Restricted Project

Jul 21 2020

ye-luo updated subscribers of D84182: [OPENMP]Fix PR46012: declare target pointer cannot be accessed in target region..
Jul 21 2020, 8:30 AM · Restricted Project
ye-luo accepted D84182: [OPENMP]Fix PR46012: declare target pointer cannot be accessed in target region..

I verified that 46012 is fixed with this patch

Jul 21 2020, 8:29 AM · Restricted Project

Jul 19 2020

ye-luo added a comment to D82719: [OpenMPOpt][SplitMemTransfer][WIP] Getting values stored in offload arrays.

Could you describe what "SplitMemTransfer" is? The current patch summary doesn't provide sufficient explanation. What kind of offloading arrays is considered by this optimization?

Jul 19 2020, 6:54 PM · Restricted Project, Restricted Project