- User Since
- Feb 16 2016, 1:22 PM (125 w, 5 d)
Fri, Jul 13
I don't have any further comments.
Thu, Jul 12
Mon, Jul 9
Looks good. Do you think we should do the same for the generic-elf-64 plugin?
Fri, Jun 29
The addition looks good. I've seen this piece of code in quite a few places, maybe we need to refactor it into a mini-function. But let's revisit that in the future.
Mon, Jun 25
Looks good to me, thanks for submitting those fixes!
Mon, Jun 18
May 24 2018
I think this looks good. Definitely a correct implementation compared to what we have now!
May 22 2018
Looks good. I don't see any reason why performance would be affected and it is a necessary change for the sake of correctness.
May 21 2018
May 16 2018
Looks good! Thanks for taking care of it!
May 15 2018
May 14 2018
I agree with Jonas, the initial version of the nvptx RTL that we tried to upstream was somewhat like what you try to do here and the consensus was that we do not want to have these dependencies.
May 4 2018
May 2 2018
May I suggest that we use something like LIBOMPTARGET_DEVICE_RTL_DEBUG instead of LIBOMPDEVICE_DEBUG? I think this name is more descriptive of what the env var does.
Apr 27 2018
I don't think this code does atomic max. Actually, I don't think it does anything at all. First off, old_value and elem are variables which do not change inside the while-loop. So the first loop condition is either always true or always false. Secondly, atomicCAS returns the value that existed at address Buffer (regardless of whether or not the CAS operation succeeded) and not a boolean result. Also, old_value must be updated before each CAS attempt because some other thread may have changed it.
Apr 25 2018
Apr 16 2018
Apr 13 2018
Apr 12 2018
I think this is now good to go.
Apr 10 2018
Apr 5 2018
One caveat regarding Alexey's proposal: According to the CUDA programming guide, malloc on the device allocates space from a fixed-size heap. The default size of this heap is 8MB. If we run into a scenario where more than 8MB will be required for the reduction scratchpad, allocating the scratchpad from the device will fail. The heap size can be user-defined from the host, but for that to happen the host must know how large the scratchpad needs to be, which defeats the purpose of moving scratchpad allocation from the plugin to the nvptx runtime.
Apr 2 2018
OK, so we can proceed with this solution and if we observe any performance problem in the future then I will push a different fix (instead of declaring __shared__ variables as extern, we can have getter functions in the .cu file in which the variable is defined which other .cu files can call to get a pointer/reference to the variable).
Mar 29 2018
Right, I remember this issue. It is caused by the fact that clang does not support __shared__ variables to be extern. This restriction was explicitly introduced here: https://reviews.llvm.org/D25125.
Mar 28 2018
What is the problem with building the bc library with the latest clang? I cannot reproduce the issue.
Mar 22 2018
Mar 21 2018
Mar 20 2018
Mar 16 2018
Rebase after bug fix.
Updated the patch to query the target offload kind from libomp instead of parsing OMP_TARGET_OFFLOAD.
Question: If the ICV is set to DEFAULT, then libomptarget must change it to either MANDATORY (if there are available devices in the system) or DISABLED (if there are no devices). Should libomptarget notify libomp about this change?
Waiting for the libomp-side patch for OMP_TARGET_OFFLOAD, I removed parsing this env var from within libomptarget and now this revision only implements the internal logic to handle whatever libomptarget will get by querying libomp. (there is already some code checking for OMP_TARGET_OFFLOAD==DISABLED, I'm leaving it there in order not to break systems which make use of this option right now)
Mar 15 2018
Right, I just read TR6 and everything is defined there. I'll revise the patch and upload a new diff.
This is obviously good to go!
Mar 14 2018
Mar 12 2018
Mar 9 2018
Mar 8 2018
I think the assignment makes more sense.
Mar 7 2018
I think the patch looks good. Let's remove the obsolete code in preparation for the new data-sharing scheme.
OK, I left the change to the internal device ID representation out of this patch.
Mar 6 2018
Mar 1 2018
As far as I am concerned, I agree that the existing data sharing scheme should be removed in favour of a simpler future patch which will introduce the more generic data sharing scheme.
Feb 12 2018
I don't have any other remarks, looks good.
Feb 8 2018
Feb 7 2018
Feb 6 2018
Looks good. Thanks for submitting! I'll take care of duplicate extern declarations in another patch.
Jan 30 2018