- User Since
- Jan 30 2014, 6:27 AM (363 w, 3 d)
I can see that we want this but I guess the errors show the problem, replace __attribute((device)) with DEVICE and define it based on CUDA vs OpenMP properly.
Here is a minimal reproducer for the issue even without standalone device compilation (and I can build the situation in various other ways as well): https://godbolt.org/z/T1h9b5
Can you add it as a test, I am also curious how the IR looks like with this patch.
Fri, Jan 15
Generally LGTM, assuming you are fine with my comments. If you want to continue the discussion, we can as well.
Add more mustprogress/willreturn recursion tests
(Some ARM-aware ppl should review/accept)
FTR, this is the "corresponding" Attributor patch: D94743
Thu, Jan 14
Could you add the test as well. Also, include the context in the patch. See https://llvm.org/docs/Phabricator.html#phabricator-request-review-web
I don't have a solution to unknown CUDA_VERSION yet.
I don't really like that we have to implement the cuda functions we already implement in places like clang/lib/Headers/__clang_cuda_device_functions.h.
Another problem is we still depend on the CUDA_VERSION and thereby cuda.h. We should forward declare what we need here and put it in the unconditionally included openmp_wrapper headers.
I think this caused a lot of problems for the Attributor tests. I get the "conflict output" warning all the time now :(
Sure. But don't get too distracted here. All of the nvcc stuff will go away.
Remove accidentially included header
I still think we should check to enable a light-attributor run in the beginning of the pipeline. The logic seems sound, could be improved since this does initialize the fixpoint computation with a pessimistic starting value. One comment below, the rest is OK for merging.
Some notable exceptions I think are exclusive loads and stores as well as the memory barrier intrinsics, for which nosync does not apply I think.
Wed, Jan 13
Tue, Jan 12
I think I got it now. Due to the PHI nodes we can follow uses and eventually visit every definition in the loop (we care about). LGTM. Feel free to wait for someone else to look at it, or not.
I left some comments that need to be addressed, not necessarily how I proposed it but somehow. LGTM otherwise.
This spells out uncontroversial effective semantics, so LGTM.
For the attributor, add the tests that don't work with FIXMEs into the attributor test folder please, potentially just into the respective test files, e.g. nonnull.ll
Mon, Jan 11
A few minor wording suggestions and comments, nothing that would require another round of review. Adopt my suggestions as you see fit, add TODOs where it makes sense and something should be investigated in the future. LGTM
Thanks for this.
No doubt there is a problem here somewhere, not sure if this is the best fix though or just hides it one level deeper.
Generally fine, I have one questions though
Other people have input on this?
I forgot to submit some nits yesterday, address what makes sense ^. LGTM
Should we add the, "if broken -> UB!" clause while we are here?
Sun, Jan 10
This looks better.
@Tyker are you going to upstream this? If not, can someone take over?
FWIW, I agree with @nikic, we should not put this logic here. There are two problems:
- We compute something we might not need.
- We do it only in value tracking.
LGTM, please add a commit message explaining what this does though.
LGTM, one comment that should be addressed.
Fri, Jan 8
What we should do, as we move to the OpenMPIRBuilder, is to use runtime interfaces that match OpenMP directives.
Here is the omp parallel one for the host: D94332
The device one should look the same, potentially after extending it, as we want to apply the same logic regardless of the device.
This will also make things like parallel region merging (@ggeorgakoudis) much easier.
Thu, Jan 7
Address comments and improve lang ref
Add !nocapture to the store instruction syntax
Actually remove the OpenMP piece
Add tests and "nocapture_use" handling. Split off OpenMP usage.
One nit, works for me otherwise. Anyone else?