Page MenuHomePhabricator

[OpenMP][DeviceRTL] Remove `atomic::store`
AbandonedPublic

Authored by tianshilei1992 on Jul 18 2022, 11:28 AM.

Details

Reviewers
jdoerfert
jhuber6
Summary

atomic::store currently is being used by AMDGPU for named barrier. Internally
it calls __atomic_store_n compiler builtin. However, NVPTX backends doesn't
support AtomicStore instruction, causing backend crash when calling llc on
the device runtime directly, where atomic::store is not optimized out. This
patch removes atomic::store from public area and uses __atomic_store_n
directly where it is needed as a workaround. We can come back and revisit it in
the future is atomic::store is needed somewhere else.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptJul 18 2022, 11:28 AM
tianshilei1992 requested review of this revision.Jul 18 2022, 11:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 18 2022, 11:28 AM
jdoerfert accepted this revision.Jul 18 2022, 11:38 AM

LG, though the backend should be able to handle this, IMHO

This revision is now accepted and ready to land.Jul 18 2022, 11:38 AM
tianshilei1992 planned changes to this revision.Jul 18 2022, 11:45 AM

LG, though the backend should be able to handle this, IMHO

Yeah, __atomic_exchange_n is expanded based on the ordering, and one of the expansion is still atomic store, which goes back to current situation. atomic::store is only being used by AMDGPU to implement named barrier. I'm wondering if we want to drop atomic::store and use the __atomic_store_n wherever it needs?

This revision is now accepted and ready to land.Jul 18 2022, 11:59 AM
tianshilei1992 edited the summary of this revision. (Show Details)Jul 18 2022, 12:00 PM
tianshilei1992 retitled this revision from [OpenMP][DeviceRTL] Use `__atomic_exchange_n` to implement atomicStore to [OpenMP][DeviceRTL] Remove `atomic::store`.Jul 18 2022, 12:00 PM
tianshilei1992 requested review of this revision.Jul 18 2022, 12:09 PM
jdoerfert added a comment.EditedJul 19 2022, 12:51 PM

No, this is strictly worse. If anything, we can introduce a switch to ensure the ordering is known statically. We can use a macro and do it for all of the atomic ops.

No, this is strictly worse. If anything, we can introduce a switch to ensure the ordering is known statically. We can use a macro and do it for all of the atomic ops.

It doesn't work because __atomic_exchange_n with __ATOMIC_RELEASE will be lowered to atomicLoad, which again causes the issue.

tianshilei1992 abandoned this revision.Sun, Sep 4, 12:17 PM