Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
libc/src/threads/call_once.cpp | ||
---|---|---|
21 ↗ | (On Diff #263572) | Should we use ONCE_FLAG_INIT instead of 0? |
libc/test/src/threads/call_once_test.cpp | ||
28 | This and the mtx_* functions are not prefaced with __llvm_libc:: | |
30–32 | Maybe we could just make thread_count an atomic type so we're only testing call_once and not mtx_*? Are there some architectures don't have support for locking instructions and must call into libc to support atomic types? |
Address comments.
libc/test/src/threads/call_once_test.cpp | ||
---|---|---|
28 | That was a blunder! Thanks for catching. | |
30–32 | About the various architectures, like how we require the compiler provide freestanding headers, I think we should require that the compiler provide stdatomic.h. It isn't unreasonable: atomics library should be provided by the compiler or by the libc. We want the compiler to do it as they already have knowledge of the target architecture. |
libc/test/src/threads/call_once_test.cpp | ||
---|---|---|
28 | I believe the _Atomic types let you just do += 1 and let the compiler call atomic_fetch_add4 if needed |
This is incorrect.
Completion of an effective call to the call_once function synchronizes with all subsequent calls to the call_once function with the same value of flag.
Before the completion, the subsequent call should wait.
This is correct but slow, because there is always a futex syscall. A better approach is to add at least another state "initialization has finished". If the cas returns that state, don't bother calling wait. (It may also make sense to optimize the initial cas with an acquire load.)
This may also be a good example demonstrating that implementing pthread with C11 threads can cause duplication. In reality few applications use call_once. They all call pthread_once. For pthread_once, you need to think of pthread cancelling, which has to use more mechanism.
I think the logic is correct now, but my pthread_once argument persists. pthread_once provides strictly more features (pthread cancel) and people don't care about the performance of C11 call_once (nobody/few use it).
Piggybacking call_once on top of pthread_once is fine and likely favorable because this makes the code/binary sizes smaller.
libc/src/threads/linux/call_once.cpp | ||
---|---|---|
22 | I don't think we need such special magic numbers instead of sequential 0, 1, 2, 3.. Larger immediates are slower (may require more than one insn) to load on some RISC architectures. |
libc/src/threads/linux/call_once.cpp | ||
---|---|---|
22 | Also, unsigned int -> unsigned |
I have now submitted this after addressing all comments. Thanks a lot @MaskRay for helping me understand the wording in the standard.
I don't think we need such special magic numbers instead of sequential 0, 1, 2, 3..
Larger immediates are slower (may require more than one insn) to load on some RISC architectures.