- User Since
- Oct 12 2019, 11:44 AM (39 w, 1 d)
Tue, Jul 7
Mon, Jul 6
Tue, Jun 30
Mon, Jun 29
I happened to find that the huge overhead of cuMemFree might be due to the fact that it is called when the data is still being used. Will come back to this patch after I fix the issue and re-evaluate whether we still need this.
Will update failed tests later
Thank you for the patch. Just one tiny comment: are these everything for the warning about the debug output?
Sun, Jun 28
Is there any side affect for this method? If no, I think it might be better to make it default such that we don't need an extra environment variable to control it.
Wed, Jun 24
Just out of curiosity, will unordered_set work here? In theory, it will have better performance than set.
Mon, Jun 22
Thu, Jun 18
Fixed some issues and code style
Wed, Jun 17
Tue, Jun 16
Jun 5 2020
What is the concrete scenario, in which you need this monitor thread to schedule device activity?
Did you consider to use detached tasks to implement asynchronous target offloading?
Yes. In fact, it is natural to think about using detachable tasks. However, if the host side just creates a detachable task and move forward doing some heavy workload, such as encountering a work-sharing construct, no thread can pick up the task and execute it, which hurts the concurrency between host and device.
Jun 4 2020
Jun 3 2020
Thank you Jon for the review! The comments are really precious.
Jun 2 2020
Updated function names to conform with LLVM code standards
May 28 2020
May 27 2020
Just copy the execution results from Summit.
May 10 2020
This patch tried to fix various issues of race condition and introduced the concept of shadow concept. Now unshackled tasks are distributed to different unshackled threads rather than just pushing them to the master thread of unshackled team.
May 6 2020
This patch basically did the following things:
- Rebased source code;
- Changed the way to define macro to align with existing method;
- Fixed an issue that taskwait might be stuck due to the reason that all unshackled threads can not steal tasks from the master thread correctly.
May 3 2020
May 1 2020
Apr 14 2020
Apr 13 2020
Kept the parent-child chain so that taskwait could seamless work. Also mark must_wait as true if unshackled task is enabled because we might have a task out of any parallel region which means even though the team is serial
Fixed a tiny compilation error
Apr 12 2020
Updated patch accordingly
Apr 11 2020
Use back omptarget_device_environmentTy to align with deviceRTLs
Update patch accordingly
Removed useless headers
Apr 10 2020
Removed friend functions. Later I'll prepare a new patch to improve encapsulation.
Apr 9 2020
Apr 7 2020
Thank Andrey for your valuable comment.
The implementation behavior is not yet defined by the specification, so it will in any way be pilot implementation. Or are you trying it to help the specification progress?
That is a good question. The vision of course is to make it part of specification. :-)
"For now all tasks are unshackled" - good. But to me this looks like a central point of implementation.
Sorry for the unclear here. "All tasks are unshackled" means, if you look at the source code, the flag is always set to 1. Just for debug, no other reason. Basically, unshackled tasks should not impact current regular task, at least before spec says anything different.
What tasks should/can be given to unshackled threads? Is it user's decision, or runtime decision, or some combination (suggest new clause/hint?)?
That is a good point. For current purpose, it can only be used by compiler. If a nowait target is not in any parallel region, an unshackled task will be created by compiler. The motivation is, just for your information in case that you don't know the context, the "unshackled" here means it will not be bound to any parallel region. That is particularly useful for the case I mentioned. Of course it also implies, especially in current implementation design, they're actually implicitly bound to the same outer-most parallel region. Regular task synchronization should NOT work for them, except one special case that the taskwait depends on an unshackled task. What's more, since it is not part of spec, it can not be used by users. Of course if it is part of spec in the future, users can decide it and there must be new clause for it.
If such a task produce another tasks what to do with them (which thread to push them to, which team should execute them)? Tasks may have dependencies, affinity, priority, what to do with them? Should we exclude them from unshackled threads, or have some heuristics here as well? Etc.
Those are actually good open problems. From my perspective, the task generated by unshackled task should also be unshackled. Dependencies are not problem, same with regular tasks. As for others, maybe we should exclude those parts. Unshackled task is not born to replace regular task. It is a convenient way to write some tasks without wrapping them into a parallel region which is pretty weird. If users require these fancy features, they should go to regular tasks.
Regarding platform (in)dependence, we do have the implementation of monitor thread, similar technique can be used for the unshackled master thread.
Could you please tell which part is monitor thread?
And it could not only sleep on condition variable forever, but do some periodic bookkeeping for its team, e.g. to support various wait policies.
It might depend on whether we will have various wait polices. :-)
Rebase my patch
Apr 6 2020
This one can be abandoned now...
@jdoerfert Please help add more people. :-)
Apr 5 2020
Remove else after return
Fixed an issue that passing arguments to dataSubmit in a wrong order
Apr 4 2020
Added comments on why we need conditional statements in __tgt_rtl_data_retrieve and __tgt_rtl_data_submit