The reference counter for global objects marked with declare target is INF. This patch prevents the runtime from incrementing /decrementing INF refcounts. Without it, the map(delete: global_object) directive actually deallocates the global on the device. With this patch, such a directive becomes a no-op.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I'm generally fine with this. To avoid future problems we could make the RefCount private and introduce modifier methods that do the CONSIDERED_INF check always.
Change looks good to me as is. Some notes / questions on the reference counting.
We may regret using long instead of int64_t, if this is used on x86-64 with 32 bit long. E.g. windows. Are we sure a 32 bit counter can't overflow in pathological cases?
Is the signed counter type a performance optimization to hit the UB on overflow? Expected it to be unsigned as the value is never negative (iiuc).
An alternative to the bool is to use ~0 to indicate infinite count. That works on the basis that we don't increment the recount if it's inf, so no overflow, and that recounts can't reach uint64_max on a nominally 64 bit address space.
The bool/long or sentinel value are probably subtle enough to warrant a class recount, which likely already exists in llvm somewhere.
That's a good observation, although I can't think of a normal-use scenario where the refcount could overflow. Of course it can overflow in pathological cases like:
for (huge_number_of_iterations) { #pragma omp target enter data map (var) }
but in such problematic code even a 64-bit counter could overflow. Actually the above example is taylor-made to intentionally make the refcount overflow.
Curiously enough, refcounts can't overflow when bit size matches or exceeds address space.
Each refcount increment indicates some live reference to the object which itself uses some space. So even if we were refcounting an object of size one, each time we copy that object (thus increment the refcount) we also need non-zero bytes of address space for the new reference to that object. That is, we run out of address space shortly before the refcount overflows, even in the most extreme case and disregarding the considerable time it takes to count to 2^48.
Unless it's a <=32 bit counter and we're using a 48 bit address space system, where repeatedly kicking the counter can overflow within a plausible length of time (iirc it's of the order of a few hours spinning on the counter).
I'm fine with this. If others are, please accept and don't wait for me.
openmp/libomptarget/src/device.h | ||
---|---|---|
33 | Nit: public is not needed. |
Thanks for the change to uint64_t
What do you think of computing INFRefCount as (RefCnt == INFRefCount), instead of using the separate bool? It's slightly less obvious what's going on than with the separate flag, but also means the two values can't accidentally drift out of sync.
Nit: public is not needed.