This is an archive of the discontinued LLVM Phabricator instance.

[libomptarget][devicertl][nfc] Simplify target_atomic abstraction
ClosedPublic

Authored by JonChesterfield on Jan 20 2021, 10:01 AM.

Details

Summary

[libomptarget][devicertl][nfc] Simplify target_atomic abstraction

Atomic functions were implemented as a shim around cuda's atomics, with
amdgcn implementing those symbols as a shim around gcc style intrinsics.

This patch folds target_atomic.h into target_impl.h and folds amdgcn.

Further work is likely to be useful here, either changing to openmp's atomic
interface or instantiating the templates on the few used types in order to
move them into a cuda/c++ implementation file. This change is mostly to
group the remaining uses of the cuda api under nvptx' target_impl abstraction.

Diff Detail

Event Timeline

JonChesterfield requested review of this revision.Jan 20 2021, 10:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 20 2021, 10:01 AM
jdoerfert accepted this revision.Jan 20 2021, 11:48 AM

LG, can you replace it with clang or OpenMP atomics next?

This revision is now accepted and ready to land.Jan 20 2021, 11:48 AM
This revision was landed with ongoing or failed builds.Jan 20 2021, 11:51 AM
This revision was automatically updated to reflect the committed changes.

LG, can you replace it with clang or OpenMP atomics next?

Yep, expect so. OpenMP probably expects the variables to have atomic types, C++-style, and they're presently a mix of unadorned and volatile.

I'll see if I can swap them out for clang intrinsics in the first instance. Moving to atomic types will allow device scoped atomics. If we're feeling brave, could also move to a faster memory order than seq_cst.

LG, can you replace it with clang or OpenMP atomics next?

Yep, expect so. OpenMP probably expects the variables to have atomic types, C++-style, and they're presently a mix of unadorned and volatile.

No, OpenMP does not. In fact, I am unsure if it would work if they were.

I'll see if I can swap them out for clang intrinsics in the first instance. Moving to atomic types will allow device scoped atomics. If we're feeling brave, could also move to a faster memory order than seq_cst.

None of the three atomic operations provided by OpenMP, read, write, and update, sound like similar semantics as the four operations here.

None of the three atomic operations provided by OpenMP, read, write, and update, sound like similar semantics as the four operations here.

Please finish reading the section about atomics (https://www.openmp.org/spec-html/5.1/openmpsu105.html#x138-1480002.19.7). All of the above operations are expressible, as far as I can tell.
add and inc are atomic update, exchange is atomic capture, max and cas are atomic compare, all with appropriate expression statement following them.

None of the three atomic operations provided by OpenMP, read, write, and update, sound like similar semantics as the four operations here.

Please finish reading the section about atomics (https://www.openmp.org/spec-html/5.1/openmpsu105.html#x138-1480002.19.7). All of the above operations are expressible, as far as I can tell.
add and inc are atomic update, exchange is atomic capture, max and cas are atomic compare, all with appropriate expression statement following them.

AFAIK, atomic compare is a feature of 5.1, we have just read, write, update and capture.

None of the three atomic operations provided by OpenMP, read, write, and update, sound like similar semantics as the four operations here.

Please finish reading the section about atomics (https://www.openmp.org/spec-html/5.1/openmpsu105.html#x138-1480002.19.7). All of the above operations are expressible, as far as I can tell.
add and inc are atomic update, exchange is atomic capture, max and cas are atomic compare, all with appropriate expression statement following them.

All these operations should be capture because they're composite. For example, atomicAdd should be something like:

int atomicAdd(int *Addr, int Val) {
    int Old;
#pragma omp atomic capture
    { Old = *Addr; *Addr += Val; }
    return Old;
}
openmp/libomptarget/deviceRTLs/common/src/reduction.cu