This is an archive of the discontinued LLVM Phabricator instance.

[libc] Add implementation of call_once from threads.h.
ClosedPublic

Authored by sivachandra on May 12 2020, 4:50 PM.

Diff Detail

Event Timeline

sivachandra created this revision.May 12 2020, 4:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 12 2020, 4:50 PM
abrachet added inline comments.May 13 2020, 12:52 AM
libc/src/threads/call_once.cpp
21 ↗(On Diff #263572)

Should we use ONCE_FLAG_INIT instead of 0?

libc/test/src/threads/call_once_test.cpp
28

This and the mtx_* functions are not prefaced with __llvm_libc::

30–32

Maybe we could just make thread_count an atomic type so we're only testing call_once and not mtx_*?

Are there some architectures don't have support for locking instructions and must call into libc to support atomic types?

sivachandra marked 5 inline comments as done.

Address comments.

libc/test/src/threads/call_once_test.cpp
28

That was a blunder! Thanks for catching.

30–32

About the various architectures, like how we require the compiler provide freestanding headers, I think we should require that the compiler provide stdatomic.h. It isn't unreasonable: atomics library should be provided by the compiler or by the libc. We want the compiler to do it as they already have knowledge of the target architecture.

abrachet accepted this revision.May 13 2020, 10:41 AM
abrachet added inline comments.
libc/test/src/threads/call_once_test.cpp
28

I believe the _Atomic types let you just do += 1 and let the compiler call atomic_fetch_add4 if needed

This revision is now accepted and ready to land.May 13 2020, 10:41 AM

This is incorrect.

Completion of an effective call to the call_once function synchronizes with all subsequent calls to the call_once function with the same value of flag.

Before the completion, the subsequent call should wait.

This is incorrect.

Completion of an effective call to the call_once function synchronizes with all subsequent calls to the call_once function with the same value of flag.

Before the completion, the subsequent call should wait.

Thanks for catching this. Will fix it soon.

Synchronize callers to the return from the called function.

This looks good to me but it might be worth waiting for @MaskRay if he has anything to say because I missed this the first time.

libc/test/src/threads/call_once_test.cpp
61

Maybe we could assert that done_count is 0 before unlocking?

MaskRay added a comment.EditedMay 16 2020, 9:18 PM

This is correct but slow, because there is always a futex syscall. A better approach is to add at least another state "initialization has finished". If the cas returns that state, don't bother calling wait. (It may also make sense to optimize the initial cas with an acquire load.)

This may also be a good example demonstrating that implementing pthread with C11 threads can cause duplication. In reality few applications use call_once. They all call pthread_once. For pthread_once, you need to think of pthread cancelling, which has to use more mechanism.

Make syscall only if required.

I think the logic is correct now, but my pthread_once argument persists. pthread_once provides strictly more features (pthread cancel) and people don't care about the performance of C11 call_once (nobody/few use it).
Piggybacking call_once on top of pthread_once is fine and likely favorable because this makes the code/binary sizes smaller.

libc/src/threads/linux/call_once.cpp
22

I don't think we need such special magic numbers instead of sequential 0, 1, 2, 3..

Larger immediates are slower (may require more than one insn) to load on some RISC architectures.

MaskRay added inline comments.May 21 2020, 6:16 PM
libc/src/threads/linux/call_once.cpp
22

Also, unsigned int -> unsigned

I have now submitted this after addressing all comments. Thanks a lot @MaskRay for helping me understand the wording in the standard.

This revision was automatically updated to reflect the committed changes.