Page MenuHomePhabricator
Feed Advanced Search

Tue, Jul 7

tianshilei1992 added a comment to D83269: [OpenMP] Identify GPU kernels (aka. OpenMP target regions).

LGTM.

Tue, Jul 7, 7:44 AM · Restricted Project
tianshilei1992 committed rGc5348aecd772: [OpenMP] Use primary context in CUDA plugin (authored by ye-luo).
[OpenMP] Use primary context in CUDA plugin
Tue, Jul 7, 7:15 AM
tianshilei1992 closed D82718: [OpenMP] Use primary context in CUDA plugin.
Tue, Jul 7, 7:14 AM · Restricted Project

Mon, Jul 6

tianshilei1992 added inline comments to D83271: [OpenMP] Replace function pointer uses in GPU state machine.
Mon, Jul 6, 7:44 PM · Restricted Project

Tue, Jun 30

tianshilei1992 added a comment to D82718: [OpenMP] Use primary context in CUDA plugin.

This patch drops the CU_CTX_SCHED_BLOCKING_SYNC property currently selected for the context. Is this intended? Should we add another function call to request this behavior for the primary context?

That is good point. We depend on the synchronous behavior in some cases in the RTL.

Are you sure this is the right flag you need?

Tue, Jun 30, 7:32 PM · Restricted Project
tianshilei1992 added a comment to D82718: [OpenMP] Use primary context in CUDA plugin.

This patch drops the CU_CTX_SCHED_BLOCKING_SYNC property currently selected for the context. Is this intended? Should we add another function call to request this behavior for the primary context?

Tue, Jun 30, 1:03 PM · Restricted Project

Mon, Jun 29

tianshilei1992 added a comment to D81054: [OpenMP] Introduce target memory manager.

I happened to find that the huge overhead of cuMemFree might be due to the fact that it is called when the data is still being used. Will come back to this patch after I fix the issue and re-evaluate whether we still need this.

Mon, Jun 29, 10:00 PM · Restricted Project
tianshilei1992 updated the summary of D77609: [OpenMP] Added the support for unshackled task in RTL.
Mon, Jun 29, 10:00 PM · Restricted Project
tianshilei1992 updated the summary of D77609: [OpenMP] Added the support for unshackled task in RTL.
Mon, Jun 29, 10:00 PM · Restricted Project
tianshilei1992 updated the summary of D77609: [OpenMP] Added the support for unshackled task in RTL.
Mon, Jun 29, 10:00 PM · Restricted Project
tianshilei1992 updated the diff for D77609: [OpenMP] Added the support for unshackled task in RTL.
Mon, Jun 29, 10:00 PM · Restricted Project
tianshilei1992 committed rG45bb073da8ef: [OpenMP] fix clang warning about printf format in CUDA plugin (authored by ye-luo).
[OpenMP] fix clang warning about printf format in CUDA plugin
Mon, Jun 29, 7:47 PM
tianshilei1992 closed D82789: [OpenMP] fix clang warning about printf format in CUDA plugin.
Mon, Jun 29, 7:47 PM · Restricted Project
tianshilei1992 accepted D82789: [OpenMP] fix clang warning about printf format in CUDA plugin.

LGTM

Mon, Jun 29, 7:47 PM · Restricted Project
tianshilei1992 updated the diff for D77609: [OpenMP] Added the support for unshackled task in RTL.
Mon, Jun 29, 2:12 PM · Restricted Project
tianshilei1992 updated the diff for D78075: [Clang][OpenMP] Added support for nowait target in CodeGen.

Will update failed tests later

Mon, Jun 29, 1:36 PM · Restricted Project
tianshilei1992 updated the diff for D77609: [OpenMP] Added the support for unshackled task in RTL.
Mon, Jun 29, 1:36 PM · Restricted Project
tianshilei1992 updated the diff for D81989: [OpenMP] Introduce low level dependency process to target offloading.
Mon, Jun 29, 1:36 PM · Restricted Project
tianshilei1992 added a comment to D82789: [OpenMP] fix clang warning about printf format in CUDA plugin.

Thank you for the patch. Just one tiny comment: are these everything for the warning about the debug output?

Mon, Jun 29, 1:01 PM · Restricted Project

Sun, Jun 28

tianshilei1992 added a comment to D82718: [OpenMP] Use primary context in CUDA plugin.

Is there any side affect for this method? If no, I think it might be better to make it default such that we don't need an extra environment variable to control it.

Sun, Jun 28, 8:54 PM · Restricted Project

Wed, Jun 24

tianshilei1992 updated the summary of D81989: [OpenMP] Introduce low level dependency process to target offloading.
Wed, Jun 24, 1:34 PM · Restricted Project
tianshilei1992 added a comment to D82264: [OpenMP] Adopt std::set in HostDataToTargetMap.

The current lookup function needs to search by range instead of exact value. I need upper_bound functionality.

Wed, Jun 24, 10:15 AM · Restricted Project
tianshilei1992 committed rG6e5f64c44f26: [OpenMP] Adopt std::set in HostDataToTargetMap (authored by ye-luo).
[OpenMP] Adopt std::set in HostDataToTargetMap
Wed, Jun 24, 9:44 AM
tianshilei1992 closed D82264: [OpenMP] Adopt std::set in HostDataToTargetMap.
Wed, Jun 24, 9:43 AM · Restricted Project
tianshilei1992 added a comment to D82264: [OpenMP] Adopt std::set in HostDataToTargetMap.

Just out of curiosity, will unordered_set work here? In theory, it will have better performance than set.

Wed, Jun 24, 9:09 AM · Restricted Project

Mon, Jun 22

tianshilei1992 updated the diff for D81989: [OpenMP] Introduce low level dependency process to target offloading.
Mon, Jun 22, 11:49 AM · Restricted Project

Thu, Jun 18

tianshilei1992 added a comment to D81054: [OpenMP] Introduce target memory manager.

I think this optimization can be an option but not replacing the existing scheme directly allocate/free memory.
Application may request device memory outside openmp and use vendor native programming model or libraries.
Having libomptarget holding large memory doesn't make sense.
You may consider using the pool only for very small allocation requests <1M.
It is application's responsibility to take care of large memory allocation.

Thu, Jun 18, 3:54 PM · Restricted Project
tianshilei1992 updated the summary of D81989: [OpenMP] Introduce low level dependency process to target offloading.
Thu, Jun 18, 9:49 AM · Restricted Project
tianshilei1992 updated the diff for D81989: [OpenMP] Introduce low level dependency process to target offloading.

Fixed some issues and code style

Thu, Jun 18, 7:41 AM · Restricted Project

Wed, Jun 17

tianshilei1992 committed rGaaf50adb539d: Revert "[OpenMP][NFC] Added DeviceID and Event pointer to __tgt_async_info" (authored by tianshilei1992).
Revert "[OpenMP][NFC] Added DeviceID and Event pointer to __tgt_async_info"
Wed, Jun 17, 12:24 PM
tianshilei1992 added a reverting change for rGee1bf45e1d42: [OpenMP][NFC] Added DeviceID and Event pointer to __tgt_async_info: rGaaf50adb539d: Revert "[OpenMP][NFC] Added DeviceID and Event pointer to __tgt_async_info".
Wed, Jun 17, 12:24 PM
tianshilei1992 committed rGee1bf45e1d42: [OpenMP][NFC] Added DeviceID and Event pointer to __tgt_async_info (authored by tianshilei1992).
[OpenMP][NFC] Added DeviceID and Event pointer to __tgt_async_info
Wed, Jun 17, 11:51 AM
tianshilei1992 added a comment to D81989: [OpenMP] Introduce low level dependency process to target offloading.

Is there some design documentation on this? It's tricky to distinguish intent from quirks of cuda.

Amdgcn is built on the 'heterogenous system architecture' model which has a fair amount of support for managing graphs of tasks but also has challenging forward progress properties. I'm not immediately sure it would share much code with the nvptx implementation.

Wed, Jun 17, 8:03 AM · Restricted Project

Tue, Jun 16

tianshilei1992 created D81989: [OpenMP] Introduce low level dependency process to target offloading.
Tue, Jun 16, 8:46 PM · Restricted Project

Jun 5 2020

tianshilei1992 added a comment to D77609: [OpenMP] Added the support for unshackled task in RTL.

What is the concrete scenario, in which you need this monitor thread to schedule device activity?
Did you consider to use detached tasks to implement asynchronous target offloading?

Yes. In fact, it is natural to think about using detachable tasks. However, if the host side just creates a detachable task and move forward doing some heavy workload, such as encountering a work-sharing construct, no thread can pick up the task and execute it, which hurts the concurrency between host and device.

Jun 5 2020, 7:11 AM · Restricted Project

Jun 4 2020

tianshilei1992 committed rGa014fbbc219f: [OpenMP] Improve D2D memcpy to use more efficient driver API (authored by tianshilei1992).
[OpenMP] Improve D2D memcpy to use more efficient driver API
Jun 4 2020, 2:24 PM
tianshilei1992 closed D80649: [OpenMP] Improve D2D memcpy to use more efficient driver API.
Jun 4 2020, 2:24 PM · Restricted Project

Jun 3 2020

tianshilei1992 added a comment to D81054: [OpenMP] Introduce target memory manager.

Thank you Jon for the review! The comments are really precious.

Jun 3 2020, 7:49 PM · Restricted Project
tianshilei1992 added inline comments to D81054: [OpenMP] Introduce target memory manager.
Jun 3 2020, 7:49 PM · Restricted Project
tianshilei1992 committed rG8bd7e4188a09: Replace separator in OpenMP variant name mangling. (authored by LukasSommerTu).
Replace separator in OpenMP variant name mangling.
Jun 3 2020, 1:48 PM
tianshilei1992 closed D80439: Replace separator in OpenMP variant name mangling..
Jun 3 2020, 1:46 PM · Restricted Project, Restricted Project

Jun 2 2020

tianshilei1992 created D81054: [OpenMP] Introduce target memory manager.
Jun 2 2020, 10:25 PM · Restricted Project
tianshilei1992 updated the diff for D81054: [OpenMP] Introduce target memory manager.

Updated function names to conform with LLVM code standards

Jun 2 2020, 10:25 PM · Restricted Project
tianshilei1992 updated the diff for D80649: [OpenMP] Improve D2D memcpy to use more efficient driver API.

Updated documentation

Jun 2 2020, 4:29 PM · Restricted Project

May 28 2020

tianshilei1992 updated the diff for D80649: [OpenMP] Improve D2D memcpy to use more efficient driver API.
May 28 2020, 5:07 PM · Restricted Project

May 27 2020

tianshilei1992 updated the diff for D80649: [OpenMP] Improve D2D memcpy to use more efficient driver API.
May 27 2020, 5:30 PM · Restricted Project
tianshilei1992 added inline comments to D80649: [OpenMP] Improve D2D memcpy to use more efficient driver API.
May 27 2020, 4:22 PM · Restricted Project
tianshilei1992 added a comment to D80649: [OpenMP] Improve D2D memcpy to use more efficient driver API.

Just copy the execution results from Summit.

May 27 2020, 11:23 AM · Restricted Project
tianshilei1992 created D80649: [OpenMP] Improve D2D memcpy to use more efficient driver API.
May 27 2020, 11:23 AM · Restricted Project

May 10 2020

tianshilei1992 updated the diff for D77609: [OpenMP] Added the support for unshackled task in RTL.

This patch tried to fix various issues of race condition and introduced the concept of shadow concept. Now unshackled tasks are distributed to different unshackled threads rather than just pushing them to the master thread of unshackled team.

May 10 2020, 6:05 PM · Restricted Project

May 6 2020

tianshilei1992 updated the diff for D77609: [OpenMP] Added the support for unshackled task in RTL.

This patch basically did the following things:

  1. Rebased source code;
  2. Changed the way to define macro to align with existing method;
  3. Fixed an issue that taskwait might be stuck due to the reason that all unshackled threads can not steal tasks from the master thread correctly.
May 6 2020, 6:12 PM · Restricted Project

May 3 2020

tianshilei1992 committed rGcb038927ef5a: [OpenMP] Fix an issue of wrong return type of DeviceRTLTy::getNumOfDevices (authored by tianshilei1992).
[OpenMP] Fix an issue of wrong return type of DeviceRTLTy::getNumOfDevices
May 3 2020, 1:17 PM
tianshilei1992 closed D79255: [OpenMP] Fix an issue of wrong return type of DeviceRTLTy::getNumOfDevices.
May 3 2020, 1:17 PM · Restricted Project

May 1 2020

tianshilei1992 created D79255: [OpenMP] Fix an issue of wrong return type of DeviceRTLTy::getNumOfDevices.
May 1 2020, 5:11 PM · Restricted Project

Apr 14 2020

tianshilei1992 added a comment to D78075: [Clang][OpenMP] Added support for nowait target in CodeGen.

You need to update the tests too.

Apr 14 2020, 9:06 AM · Restricted Project

Apr 13 2020

tianshilei1992 created D78075: [Clang][OpenMP] Added support for nowait target in CodeGen.
Apr 13 2020, 8:06 PM · Restricted Project
tianshilei1992 updated the diff for D77609: [OpenMP] Added the support for unshackled task in RTL.

Kept the parent-child chain so that taskwait could seamless work. Also mark must_wait as true if unshackled task is enabled because we might have a task out of any parallel region which means even though the team is serial

Apr 13 2020, 5:57 PM · Restricted Project
tianshilei1992 committed rG4031bb982b7a: [OpenMP] Refined CUDA plugin to put all CUDA operations into class (authored by tianshilei1992).
[OpenMP] Refined CUDA plugin to put all CUDA operations into class
Apr 13 2020, 10:46 AM
tianshilei1992 closed D77951: [OpenMP] Refined CUDA plugin to put all CUDA operations into class.
Apr 13 2020, 10:46 AM · Restricted Project
tianshilei1992 updated the diff for D77951: [OpenMP] Refined CUDA plugin to put all CUDA operations into class.

Fixed a tiny compilation error

Apr 13 2020, 9:39 AM · Restricted Project
tianshilei1992 added a comment to D77951: [OpenMP] Refined CUDA plugin to put all CUDA operations into class.

I assume you tested this locally and it works as expected? If so, I think we can go ahead. One comment to address now below.

Apr 13 2020, 9:39 AM · Restricted Project

Apr 12 2020

tianshilei1992 added inline comments to D77951: [OpenMP] Refined CUDA plugin to put all CUDA operations into class.
Apr 12 2020, 6:10 PM · Restricted Project
tianshilei1992 updated the diff for D77951: [OpenMP] Refined CUDA plugin to put all CUDA operations into class.

Updated patch accordingly

Apr 12 2020, 10:40 AM · Restricted Project
tianshilei1992 added inline comments to D77951: [OpenMP] Refined CUDA plugin to put all CUDA operations into class.
Apr 12 2020, 10:40 AM · Restricted Project

Apr 11 2020

tianshilei1992 updated the diff for D77951: [OpenMP] Refined CUDA plugin to put all CUDA operations into class.

Use back omptarget_device_environmentTy to align with deviceRTLs

Apr 11 2020, 7:44 PM · Restricted Project
tianshilei1992 updated the diff for D77951: [OpenMP] Refined CUDA plugin to put all CUDA operations into class.

Update patch accordingly

Apr 11 2020, 7:44 PM · Restricted Project
tianshilei1992 added inline comments to D77951: [OpenMP] Refined CUDA plugin to put all CUDA operations into class.
Apr 11 2020, 7:44 PM · Restricted Project
tianshilei1992 updated the diff for D77951: [OpenMP] Refined CUDA plugin to put all CUDA operations into class.

Removed useless headers

Apr 11 2020, 11:43 AM · Restricted Project
tianshilei1992 updated the summary of D77951: [OpenMP] Refined CUDA plugin to put all CUDA operations into class.
Apr 11 2020, 11:43 AM · Restricted Project
tianshilei1992 created D77951: [OpenMP] Refined CUDA plugin to put all CUDA operations into class.
Apr 11 2020, 11:43 AM · Restricted Project
tianshilei1992 committed rGfeed674deca1: [OpenMP] Introduce stream pool to make sure the correctness of device synchr... (authored by tianshilei1992).
[OpenMP] Introduce stream pool to make sure the correctness of device synchr...
Apr 11 2020, 4:15 AM
tianshilei1992 closed D77412: [OpenMP] Introduce stream pool to make sure the correctness of device synchronization.
Apr 11 2020, 4:14 AM · Restricted Project

Apr 10 2020

tianshilei1992 updated the diff for D77412: [OpenMP] Introduce stream pool to make sure the correctness of device synchronization.
Apr 10 2020, 5:47 PM · Restricted Project
tianshilei1992 added inline comments to D77412: [OpenMP] Introduce stream pool to make sure the correctness of device synchronization.
Apr 10 2020, 5:47 PM · Restricted Project
tianshilei1992 added inline comments to D77412: [OpenMP] Introduce stream pool to make sure the correctness of device synchronization.
Apr 10 2020, 11:11 AM · Restricted Project
tianshilei1992 updated the diff for D77412: [OpenMP] Introduce stream pool to make sure the correctness of device synchronization.
Apr 10 2020, 11:06 AM · Restricted Project
tianshilei1992 added inline comments to D77412: [OpenMP] Introduce stream pool to make sure the correctness of device synchronization.
Apr 10 2020, 10:12 AM · Restricted Project
tianshilei1992 updated the diff for D77412: [OpenMP] Introduce stream pool to make sure the correctness of device synchronization.

Removed friend functions. Later I'll prepare a new patch to improve encapsulation.

Apr 10 2020, 9:26 AM · Restricted Project

Apr 9 2020

tianshilei1992 committed rG03ff643d2e9e: [OpenMP] Put old APIs back and added new _async series for backward… (authored by tianshilei1992).
[OpenMP] Put old APIs back and added new _async series for backward…
Apr 9 2020, 7:48 PM
tianshilei1992 closed D77822: [OpenMP] Put old APIs back and added new _async series for backward compatibility.
Apr 9 2020, 7:48 PM · Restricted Project
tianshilei1992 added inline comments to D77822: [OpenMP] Put old APIs back and added new _async series for backward compatibility.
Apr 9 2020, 5:39 PM · Restricted Project
tianshilei1992 updated the summary of D77822: [OpenMP] Put old APIs back and added new _async series for backward compatibility.
Apr 9 2020, 1:20 PM · Restricted Project
tianshilei1992 created D77822: [OpenMP] Put old APIs back and added new _async series for backward compatibility.
Apr 9 2020, 1:18 PM · Restricted Project

Apr 7 2020

tianshilei1992 added a comment to D77609: [OpenMP] Added the support for unshackled task in RTL.

Thank Andrey for your valuable comment.

The implementation behavior is not yet defined by the specification, so it will in any way be pilot implementation. Or are you trying it to help the specification progress?

That is a good question. The vision of course is to make it part of specification. :-)

"For now all tasks are unshackled" - good. But to me this looks like a central point of implementation.

Sorry for the unclear here. "All tasks are unshackled" means, if you look at the source code, the flag is always set to 1. Just for debug, no other reason. Basically, unshackled tasks should not impact current regular task, at least before spec says anything different.

What tasks should/can be given to unshackled threads? Is it user's decision, or runtime decision, or some combination (suggest new clause/hint?)?

That is a good point. For current purpose, it can only be used by compiler. If a nowait target is not in any parallel region, an unshackled task will be created by compiler. The motivation is, just for your information in case that you don't know the context, the "unshackled" here means it will not be bound to any parallel region. That is particularly useful for the case I mentioned. Of course it also implies, especially in current implementation design, they're actually implicitly bound to the same outer-most parallel region. Regular task synchronization should NOT work for them, except one special case that the taskwait depends on an unshackled task. What's more, since it is not part of spec, it can not be used by users. Of course if it is part of spec in the future, users can decide it and there must be new clause for it.

If such a task produce another tasks what to do with them (which thread to push them to, which team should execute them)? Tasks may have dependencies, affinity, priority, what to do with them? Should we exclude them from unshackled threads, or have some heuristics here as well? Etc.

Those are actually good open problems. From my perspective, the task generated by unshackled task should also be unshackled. Dependencies are not problem, same with regular tasks. As for others, maybe we should exclude those parts. Unshackled task is not born to replace regular task. It is a convenient way to write some tasks without wrapping them into a parallel region which is pretty weird. If users require these fancy features, they should go to regular tasks.

Regarding platform (in)dependence, we do have the implementation of monitor thread, similar technique can be used for the unshackled master thread.

Could you please tell which part is monitor thread?

And it could not only sleep on condition variable forever, but do some periodic bookkeeping for its team, e.g. to support various wait policies.

It might depend on whether we will have various wait polices. :-)

Apr 7 2020, 2:10 PM · Restricted Project
tianshilei1992 updated the diff for D77412: [OpenMP] Introduce stream pool to make sure the correctness of device synchronization.

Rebase my patch

Apr 7 2020, 1:03 PM · Restricted Project
tianshilei1992 committed rG32ed29271fd8: [OpenMP] Optimized stream selection by scheduling data mapping for the same… (authored by tianshilei1992).
[OpenMP] Optimized stream selection by scheduling data mapping for the same…
Apr 7 2020, 11:59 AM
tianshilei1992 closed D77005: [OpenMP] Optimized stream selection by scheduling data mapping for the same target region into a same stream.
Apr 7 2020, 11:58 AM · Restricted Project
tianshilei1992 updated the summary of D77609: [OpenMP] Added the support for unshackled task in RTL.
Apr 7 2020, 10:50 AM · Restricted Project
tianshilei1992 abandoned D70010: [OpenMP][Offloading] Replaced default stream with an actual per-device unblocking stream in NVPTX implementation.
Apr 7 2020, 7:34 AM · Restricted Project, Restricted Project

Apr 6 2020

tianshilei1992 added a comment to D70010: [OpenMP][Offloading] Replaced default stream with an actual per-device unblocking stream in NVPTX implementation.

Superseded by D74145. You can abandon this one.

Apr 6 2020, 8:44 PM · Restricted Project, Restricted Project
tianshilei1992 added a comment to D70010: [OpenMP][Offloading] Replaced default stream with an actual per-device unblocking stream in NVPTX implementation.

This one can be abandoned now...

Apr 6 2020, 5:29 PM · Restricted Project, Restricted Project
tianshilei1992 added a comment to D77609: [OpenMP] Added the support for unshackled task in RTL.

@jdoerfert Please help add more people. :-)

Apr 6 2020, 4:55 PM · Restricted Project
tianshilei1992 updated the summary of D77609: [OpenMP] Added the support for unshackled task in RTL.
Apr 6 2020, 4:55 PM · Restricted Project
tianshilei1992 created D77609: [OpenMP] Added the support for unshackled task in RTL.
Apr 6 2020, 4:55 PM · Restricted Project

Apr 5 2020

tianshilei1992 updated the diff for D77005: [OpenMP] Optimized stream selection by scheduling data mapping for the same target region into a same stream.

Remove else after return

Apr 5 2020, 12:16 PM · Restricted Project
tianshilei1992 added inline comments to D77005: [OpenMP] Optimized stream selection by scheduling data mapping for the same target region into a same stream.
Apr 5 2020, 11:44 AM · Restricted Project
tianshilei1992 added inline comments to D77005: [OpenMP] Optimized stream selection by scheduling data mapping for the same target region into a same stream.
Apr 5 2020, 10:40 AM · Restricted Project
tianshilei1992 updated the diff for D77005: [OpenMP] Optimized stream selection by scheduling data mapping for the same target region into a same stream.

Fixed an issue that passing arguments to dataSubmit in a wrong order

Apr 5 2020, 6:23 AM · Restricted Project

Apr 4 2020

tianshilei1992 added a comment to D77005: [OpenMP] Optimized stream selection by scheduling data mapping for the same target region into a same stream.

I think this is good. Do you happen to have some runtime results?

Apr 4 2020, 8:46 PM · Restricted Project
tianshilei1992 updated the diff for D77005: [OpenMP] Optimized stream selection by scheduling data mapping for the same target region into a same stream.

Added comments on why we need conditional statements in __tgt_rtl_data_retrieve and __tgt_rtl_data_submit

Apr 4 2020, 12:13 PM · Restricted Project