Page MenuHomePhabricator
Feed Advanced Search

Jul 26 2019

__simt__ added a comment to D65348: enable <atomic> header on systems without thread-support.

I think we want people to use -ffreestanding more for things like this, so the part where this is circumvented isn't clearly the right choice. The rest seems fine, and is exactly the reason why I created _LIBCPP_ATOMIC_ONLY_USE_BUILTINS.

Jul 26 2019, 2:03 PM

Jun 13 2019

__simt__ added a comment to D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile..
In D62393#1542042, @tra wrote:

C++ volatile will give you that. You also rely on atomicity. C++ volatile does not guarantee that, even if NVPTX does happen to. It's a mere coincidence. What if next PTX revision provides a better way to implement C++ volatile without providing atomicity? Then your code will all of a sudden break in new and exciting ways.

Jun 13 2019, 10:26 AM · Restricted Project
__simt__ added a comment to D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile..

No, I'm not relying on the non-optimization of atomics. I need both, volatile semantics and atomic. So, the compiler could not optimize out memory accesses and the access would be atomic.

Jun 13 2019, 10:16 AM · Restricted Project
__simt__ added a comment to D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile..
In D62393#1541969, @tra wrote:

@reames , @tra , @Hahnfeld , @jlebar , @chandlerc, I see that this was discussed in D50391 (in terms of using PTX's volatile to provide atomic lowering), but it's not clear to me that using volatile with atomic semantics in LLVM is something we guarantee will work (even in CUDA mode). I imagine that we'd need to lower these in terms of relaxed atomics in Clang for this to really be guaranteed to work correctly.

Do I understand it correctly that the argument is about using C++ volatile with the assumption that those will map to ld.volatile/st.volatile, which happen to provide sufficiently strong guarantees to be equivalent of LLVM's atomic monotonic operations?

If that's the case, then I would agree that it's an implementation detail one should not be relying on. If atomicity is needed, we should figure out a way to express that in C++.

In practice, the standard C++ library support in cuda is somewhat limited. I don't think atomic<> works, so volatile might be a 'happens to work' short-term workaround. If that's the case, then there should a comment describing what's going on here and TODO to fix it when better support for atomics on GPU is available.

Another option would be to use atomic*() functions provided by CUDA, but those do have limited functionality on older GPUs.

Yet another alternative is to explicitly use ld.volatile/st.volatile from inline assembly. I don't think we have it plumbed as clang builtin.

Artem, thanks for your comment. But we need not only atomicity, bht also we need to command the ptxas compiler to not optimize accesses to parallelLevel. According to several comments, (like this one https://stackoverflow.com/a/1533115) it is recommended to use volatile modifier in Cuda in such cases. If clang does not provide the required level of support for Cuda volatile, I think this is an incompatibility with nvcc.
Also, I already thought about using PTX instructions directly. Probably, you're right that it would be better to use them if you're sure that there is a difference between nvcc and clang.

Jun 13 2019, 9:47 AM · Restricted Project

Jun 11 2019

__simt__ added a comment to D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile..

I know some things about CUDA, volatile and C++. Let's see if I can understand the part of the proposed change that involves these. I don't understand the part about the test but I don't need to, I'll ignore that.

The gist of the issue is that parallelLevel table should really be atomic<> because of data-races on it (that was a bug prior to this, maybe there are more such bugs lying around), except there's a problem with the obvious fix: atomic<> is not available in CUDA (yet) so it's not an option to fix this issue. The next best thing we have instead is volatile.

Now, volatile in PTX (e.g. asm("ld.volatile...[x];")) and volatile (e.g. "volatile ...x;") in C++ source like this, are not exactly the same thing. When CUDA says that volatile is equivalent memory_order_relaxed, it's saying something clear (I think) about the PTX level code but it's still being pretty vague about CUDA C++ level code. OTOH it's entirely possible for Clang to do something with either atomic<> or volatile that isn't valid for the other -- and that's a different can of worms than, say, NVCC which does {whatever NVCC does, it's not the compiler you're using}.

However, since people do use CUDA C++ volatile this way a lot already, you can't really have a good CUDA toolchain unless it is the case (including, by accident) that it works for this purpose. In other words, it's probably reasonable to assume this in your code because every other code would be on fire otherwise, and it's not on fire, so far as we can tell.

It might be worth it to prepare your code for the eventual arrival of atomic<> on CUDA. That is, maybe create a template alias on T with some helper functions, just enough for your use. It might make this code more self-documenting and make it easy to make it 100% legit later on.

Hi Olivier, thanks for your comments. You're absolutely right. Actually, we're using both compilers, nvcc and clang (under different conditions, though). Marking the variable volatile does not break it in the LLVM level. Maybe, it is by accident, but I rather doubt in this.
Do you suggest to create a template function that will provide the access to the parallelLevel variable? Amd when the atomic<> is supported by Cuda change the type of this variable to atomic<> so the compiler could automatically instantiate this template function with the proper type, right? Or you have something different in mind? If so, could provide a small example of your idea?

Jun 11 2019, 2:37 PM · Restricted Project
__simt__ added a comment to D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile..

I know some things about CUDA, volatile and C++. Let's see if I can understand the part of the proposed change that involves these. I don't understand the part about the test but I don't need to, I'll ignore that.

Jun 11 2019, 1:56 PM · Restricted Project

Mar 4 2019

__simt__ added a comment to D56913: decoupling Freestanding atomic<T> from libatomic.a.

This commit broke the atomic lldb data formatter.

http://lab.llvm.org:8080/green/view/LLDB/job/lldb-cmake/21037/

@shafik @jingham

Mar 4 2019, 11:10 AM · Restricted Project

Mar 3 2019

__simt__ added inline comments to D56913: decoupling Freestanding atomic<T> from libatomic.a.
Mar 3 2019, 9:37 AM · Restricted Project

Feb 25 2019

__simt__ added a comment to D56913: decoupling Freestanding atomic<T> from libatomic.a.

Hey guys, can I get additional feedback or else proceed with this patch?

Feb 25 2019, 2:29 PM · Restricted Project

Feb 14 2019

__simt__ added a comment to D56913: decoupling Freestanding atomic<T> from libatomic.a.

Dear other reviewers, what else needs to happen here?

Feb 14 2019, 9:13 AM · Restricted Project

Feb 12 2019

__simt__ updated the diff for D56913: decoupling Freestanding atomic<T> from libatomic.a.

This version addresses the preceding comments and passes libcxx tests across c++03, 11, 14 and 17 modes.

Feb 12 2019, 10:19 AM · Restricted Project

Feb 11 2019

__simt__ added a comment to D56913: decoupling Freestanding atomic<T> from libatomic.a.

Would it make sense to decide whether we want to use GCC's non-lockfree atomics or not based on a configuration macro that's not _LIBCPP_FREESTANDING?

Feb 11 2019, 11:52 AM · Restricted Project

Feb 8 2019

__simt__ updated the diff for D56913: decoupling Freestanding atomic<T> from libatomic.a.

This version passes libcxx tests with each combination of path that can be used (force GCC, force C11, force freestanding+non-lock-free). There were quite a few problems around volatile that I had not addressed yet, apologies.

Feb 8 2019, 2:59 PM · Restricted Project

Feb 7 2019

__simt__ added a comment to D56913: decoupling Freestanding atomic<T> from libatomic.a.

I need to test the GCC path better, it still has some bugs. Be right back.

Feb 7 2019, 9:21 AM · Restricted Project

Feb 6 2019

__simt__ updated the diff for D56913: decoupling Freestanding atomic<T> from libatomic.a.

In this version:

Feb 6 2019, 9:15 PM · Restricted Project

Feb 5 2019

__simt__ added a comment to D56913: decoupling Freestanding atomic<T> from libatomic.a.

I will come back with another patch that addresses these comments.

Feb 5 2019, 9:28 PM · Restricted Project

Feb 4 2019

__simt__ updated the diff for D56913: decoupling Freestanding atomic<T> from libatomic.a.

In this version I've restored the __cxx_atomic_... layer to which both the GCC and C11 backends map. This addresses the comment about introducing more functions named __c11_atomic... which are not part of the C11 builtin set. I do not introduce any new versions of _Atomic anymore.

Feb 4 2019, 5:37 PM · Restricted Project
__simt__ added a comment to D56913: decoupling Freestanding atomic<T> from libatomic.a.

Thanks for the comments, Louis, responses below.

Feb 4 2019, 9:26 AM · Restricted Project

Jan 31 2019

__simt__ added a comment to D56913: decoupling Freestanding atomic<T> from libatomic.a.
In D56913#1379046, @jfb wrote:

Should all these macros be in __config?

Jan 31 2019, 12:18 PM · Restricted Project
__simt__ updated the diff for D56913: decoupling Freestanding atomic<T> from libatomic.a.

Removed an inadvertent #define left in there for testing.

Jan 31 2019, 9:02 AM · Restricted Project
__simt__ updated the diff for D56913: decoupling Freestanding atomic<T> from libatomic.a.

Would be better if it passed the libcxx tests with the feature turned on. Like now.

Jan 31 2019, 8:56 AM · Restricted Project
__simt__ updated the diff for D56913: decoupling Freestanding atomic<T> from libatomic.a.

Fixed some spurious whitespace changes I didn't intend.

Jan 31 2019, 7:34 AM · Restricted Project
__simt__ updated the diff for D56913: decoupling Freestanding atomic<T> from libatomic.a.

Simplified the changes significantly. By switching my back-end to slide under the C11 side instead of the GCC/Clang side, I can live without the new interposer layer.

Jan 31 2019, 7:31 AM · Restricted Project

Jan 30 2019

__simt__ added a comment to D56913: decoupling Freestanding atomic<T> from libatomic.a.

Quick update before a longer update: I have a simpler patch on the way.

Jan 30 2019, 10:16 PM · Restricted Project

Jan 18 2019

__simt__ added a comment to D56913: decoupling Freestanding atomic<T> from libatomic.a.
In D56913#1363801, @jfb wrote:

One downside to forwarding layers is that if you don't have an always inline attribute, some build might not inline. This leads to code bloat and crappy codegen because the memory_order attribute is no longer a known constexpr value. Maybe we should have always inline?

You don't define LIBCXX_FREESTANDING yet, right?

I think at a high level this looks fine.

Jan 18 2019, 5:27 PM · Restricted Project
__simt__ updated the diff for D56913: decoupling Freestanding atomic<T> from libatomic.a.

Updated the patch with a bit higher-quality and better-tested code than what I originally showed.

Jan 18 2019, 5:14 PM · Restricted Project
__simt__ added a comment to D56534: [Verifier] Add verification of unaligned atomic load/store.

With apologies, I would just like to tack a note into this thread that the entire *field* of formal memory model proofs involving partially-overlapping atomics is a single paper (last I knew, https://www.cl.cam.ac.uk/~pes20/popl17/mixed-size.pdf). The paper says mixed-size, but the key point is not-perfectly-overlapping.

Jan 18 2019, 3:42 PM
__simt__ added a comment to D56913: decoupling Freestanding atomic<T> from libatomic.a.

Just a clarification - please evaluate the design aspects first. There are nits that I know are wrong and am still working on.

Jan 18 2019, 1:24 PM · Restricted Project
__simt__ created D56913: decoupling Freestanding atomic<T> from libatomic.a.
Jan 18 2019, 7:59 AM · Restricted Project

Dec 10 2018

__simt__ added a comment to D55517: Remove `_VSTD`.

Hello Eric, everyone else,

Dec 10 2018, 2:16 PM