- User Since
- Jan 30 2014, 6:27 AM (364 w, 6 d)
wrong selection :(
LG, we need to backport this too. And we need to document that profiling multi-threaded applications needs this flag in the build.
LG, some suggestions for the final version.
please add to the backport list
This is not NFC but other than that reasonable. LG, @grokos ?
also need to be backported to 12
turn this build off by default if cuda is missing (what we used to have)
we need to put this in 12 as well, I think we need to open a bug that blocks the 12 meta-bug and start adding these patches (or git hashes) to be included in the release branch.
Does O1 create TBAA? we might need -O3 -Xclang -disable-llvm-optzns or -Xclang -disable-O0-optnone if we can also remove the noinline that way.
Tue, Jan 26
vzakhari removed a reviewer: jdoerfert.
I may be doing something wrong, but D95468 did not help very much looking at these numbers it seems...
LG, with the nits I mentioned before.
A few nits but given the opt-in semantics now I don't see a reason not to include this plugin (as we included other plugins before).
Mon, Jan 25
LG, with the nit @grokos pointed out.
Known issues resolved, AMDGPU is not yet a supported target and hard to test right now. LG
And all exported from libomp.so symbols supposed to be C symbols.
Yeah, I agree that mustprogress, argmemonly and nosync should imply willreturn. However, FuncAttrs currently infers neither nosync nor argmemonly, so it would need those first. I think that the argmemonly check here was there to cover some common intrinsics like llvm.memcpy, but those are explicitly willreturn now.
Sun, Jan 24
Sat, Jan 23
More profiling and selective at to avoid too coarse grained scopes
Logic seems sounds to me. Code as well, one minor nit below. I have no objections to this.
Fri, Jan 22
Still not perfect but getting there:
LGTM. Thanks for following up on this!
Switch the commit message with the subject though. LGTM
Can we add something along the lines of the commit message for D95260
Calls that aren't willreturn should not be eliminated, as this may remove an infinite loop.
One nit wrt. _v2 not _v2, otherwise LGTM.
Pretty much good to go. CMAKE variable, one potential change in the comment below, maybe name the declarations _v2.
Looks reasonable to me. Test coverage is nice now, thanks.
LGTM. Maybe wait a bit if someone else wants to take a look.
+1 on the motivation. When we landed the other patch we didn't consider all the implications and, so far, they clearly outweigh the benefits of having partial undef offsets "work".
Thu, Jan 21
Interesting. Should we do this instead of the pure macro stuff? Is this going to break if the argument type doesn't match the declaration parameter type, e.g.,
LGTM one nit
What is left after we merged the loop unswitch solution?
LGTM. FWIW, nonnull should imply noundef because if it was undef we could pick null.
Given the langref text I'm unsure what it means to link an append linkage variable to a non-append linkage one. I can see how extern is different but what would happen before & now if you remove the extern. And what happens before/now if you have append on both @var? We should add those tests if possible.
Wed, Jan 20
Look for CUDA functions on demand and once only, inform libomptarget if CUDA was not found.
Load the cuda functions at runtime
Looks OK to me, @Meinersbur can accept at some point.
LGTM, one comment below though.
LG, can you replace it with clang or OpenMP atomics next?
TBH, I feel "X is readonly and has side effects" sends the wrong message to begin with. It is a contradiction (in the IR world) as basically shown by the need for this patch. Given that there are no examples in-tree I don't understand why one would mark a side-effect intrinsic as readonly (or similar). Long story short, I would argue this should be a load error, not silently ignored.
At the moment this patch defines compatibility as exact string match of bundler entry ID.
Supporting target ID requires little more work and discussion.
Tue, Jan 19
Let's split this. Declare target in all the source files can go first, doesn't hurt anyone.
Forward declarations should also work fine with existing compilation, so that would be two.
Conditional defines and a SHARED(...) macro would be three.
Four is the cmake magic. Potentially a different folder to build it as C++ + OpenMP alternatively.
LGTM, I think this is correct. Thanks a lot!
We should replace the OMP_NUM_TEAMS handling in the plugins with this, right? Separate patch or can we do it right away?