This is an archive of the discontinued LLVM Phabricator instance.

[libc][rpc] Update locking to work on volta
ClosedPublic

Authored by JonChesterfield on May 4 2023, 1:14 PM.

Details

Summary

Carefully work around not knowing the thread mask that nvptx intrinsic
functions require.

If the warp is converged when calling try_lock, a single rpc call will handle
all lanes within it. Otherwise more than one rpc call with thread masks that
compose to the unknown one will occur.

Diff Detail

Event Timeline

JonChesterfield created this revision.May 4 2023, 1:14 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 4 2023, 1:14 PM
JonChesterfield requested review of this revision.May 4 2023, 1:14 PM

We might end up with memory fences in lock/unlock but I'm hopeful the ones associated with send/recv will be sufficient.

Unlock is straightforward - fetch_and is doing the same thing as store zero, except making sure that at most one write of zero occurs in the warp.

Lock took a while to derive. The problem is that fetch_or can be run by threads which are not in lane_mask, or by a subset of lane_mask, and the threads outside of lane_mask may be the ones that win the fetch_or to set the bit. It might be clearer to use is_first_lane in lock instead of having all threads try to take the lock.

Both avoid branching in is_first_lane, partly on performance grounds and partly because simpler control flow seems to compile more reliably. This might allow removing the broadcast functions in util.

jhuber6 updated this revision to Diff 519649.May 4 2023, 2:04 PM

Fix missing header

jhuber6 accepted this revision.May 4 2023, 2:11 PM

Logic is harder to understand, but is probably more correct.

libc/src/__support/RPC/rpc.h
158

We might need another lane sync here. I'll test when I put the parallelism back in.

This revision is now accepted and ready to land.May 4 2023, 2:11 PM
  • update comment
This revision was automatically updated to reflect the committed changes.