- User Since
- Sep 24 2015, 4:45 AM (177 w, 4 d)
Mon, Feb 11
Since there is only one other ICV which might be handled similary (bind-var), I will not further argue for making a special case for thread-limit-var. So leave it as it is.
LGTM, thanks for catching this!
The output is written to the build dir anyways:
Thu, Feb 7
Please provide test cases, so that we can understand the impact of this change.
I'm a bit confused about the explicit copy per task.
Mon, Feb 4
The code is updated to be compatible with OMPT as in OpenMP 5.0.
Fri, Jan 25
Jan 16 2019
The patch in general looks good to me.
Updated as requested
Jan 15 2019
Fix the last mentioning of ompt.h
Is this good to go for 8.0?
Jan 10 2019
Implemented changes by Hansang.
Jan 9 2019
Does the observation in https://bugs.llvm.org/show_bug.cgi?id=36772 match your expectation regarding the remaining race?
Jan 8 2019
- I activated the omp_get_initial_device() call to get the number at runtime.
- I also changed the name of ompt.h to omp-tools.h as defined in the SPEC
Jan 7 2019
Implemented in r349458.
Missed to include the review number in the commit message
Jan 2 2019
Is the intention to have this ready for the 8.0 release?
Dec 18 2018
Dec 17 2018
Implemented most changes requested by Hansang
Dec 12 2018
Dec 11 2018
This will not compile with the current version of the OMPT interface, needs updates first
Nov 14 2018
Oct 19 2018
Sep 14 2018
Sep 11 2018
Sep 10 2018
Aug 24 2018
Aug 15 2018
Was hold back from 7.0, to keep the implementation according to TR6.
Aug 13 2018
@Hahnfeld and I discussed the behavior and found a very inconsistent behavior. Let's consider something like the following code, under the constraint, that device(1) is busy (will fail to offload) and device(0) is free:
Aug 10 2018
Thanks for catching this. It seems like we need some Fortran tests for OMPT.
Aug 1 2018
Jul 31 2018
I second this change, since this should hopefully allow to include OpenMP in the release build, until someone can fix the build of OMPT on Windows.
Jul 27 2018
I would like to commit this patch before the 7.0 branch.
Changing the name of the bit can be done in a later patch.
@jmellorcrummey do you agree, can you remove your request for changes?
I apply changes requested by hbae on commit.
I apply suggested changes by hbae on commit.
Jul 17 2018
The change is symetric with what is implemented for cuda.
Jul 5 2018
Jul 3 2018
As discussed on the mailing list, the flag should only be dropped on Mac OS. My pragmatic solution would be:
Jun 20 2018
Jun 18 2018
The OMPT tool can decide at multiple points to be inactive, here we look at:
The problem, which gets visible in this test case is the use of __builtin_frame_address(1), which is documented to be not safe.
Is there a better way to get the canonical frame address of the calling function? Also the address returned by __builtin_frame_address seems to be different from the canonical frame address. How can we get the requested address?
Jun 7 2018
taskloop is actually a tasking construct and explicitly no worksharing construct. So, please move the test in the task directory.
Jun 4 2018
May 27 2018
May 10 2018
@dvyukov thanks for the detailed reasoning.
May 7 2018
Result of discussion in Tools WG was that the spec will keep omp_frame_t. I will submit this patch together with D43568
Apr 23 2018
Reasons to keep the name archer (or archer-rt):
- We already have that name, we don't need to come up with a new name :)
- You can easily find related publications under that name, which explain the reasoning behind the library
- The programmer should never see the name, once the tool is integrated into the runtime workflow
Apr 20 2018
Apr 17 2018
I added a testcase. I added it just for Linux, because I have no machine ready to test dl-loading on other OS.
Apr 11 2018
Mar 25 2018
I updated this differential to only export the function.
Feb 28 2018
Feb 23 2018
The idea was that the WAIT in line 11 should ensure that both initial threads arrived.
But actually the runtime is not initialized before line 11. To fix the race, we need to call into the runtime in a way, that makes both threads initial threads before the SIGNAL in line 9.
Feb 22 2018
As far as I know, the current implementation is very close to the specification in TR6.
Here we have a tiny diff between the spec and the implementation. I agree that we should not apply this patch if we will roll back the change in the spec.
Feb 19 2018
Use the captured pattern as pointed out by Olga
Feb 17 2018
The initial issue in this patch is resolved by D43195. So removed the additionally printed address for AARCH64, but still allow the testcases to match any printed address.
I applied clang-format on commit.
Feb 14 2018
This behavior is right (p.417,l.5):
Feb 9 2018
Feb 6 2018
Jan 30 2018
Jan 25 2018
NEEDS must be REQUIRES