This allows pthread_self to be pulled out of a loop by LICM.
Details
Diff Detail
- Build Status
Buildable 6111 Build 6111: arc lint + arc unit
Event Timeline
lib/Analysis/TargetLibraryInfo.cpp | ||
---|---|---|
1182–1184 | I'm not entirely sure about this bit. Thread identifiers should be considered opaque: any attempt to use a thread ID other than in pthreads calls is nonportable and can lead to unspecified results. so this assumption of it being an integer type might not hold. |
lib/Analysis/TargetLibraryInfo.cpp | ||
---|---|---|
1182–1184 | Yes, I am a bit ambivalent on this as well when i wrote the patch. |
test/Transforms/LICM/pthread.ll | ||
---|---|---|
6 | Doesn't actually check for speculatable on declaration |
I have an internal synthetic benchmark which is not very interesting. But I imagine its not difficult to come up with a case which a function calls pthread_self on entry and the function is called within a loop. After inlining, the pthread_self is called inside the loop.
BENCHMARK(pthread_self, n) {
for (int i = 0; i < n; i++) { auto id = pthread_self(); folly::doNotOptimizeAway(id); }
}
Looks okay, my POSIX was rusty so I looked at it again and this seems OK to speculate as it has no side-effects.
The thread id is guaranteed to be unique across all *running* threads, but can be reused when a threads joins and another one is created. I don't think this matters for this optimization.
I looked very closely at the opengroup spec and I can't find a paragraph stating that the return of pthread_self() is guaranteed to be unique for the thread lifetime.
I expect any sane implementation to do that, and it's probably implicit.
BTW, GCC moves pthread_self out of the loop on Linux. so I would think this is legal on linux. Unless GCC handles pthread_self differently on different platforms, I think this would be fine.
Also, if you read this carefully http://h20564.www2.hpe.com/hpsc/doc/public/display?docId=emr_na-c02267692&lang=en-us&cc=us
Its saying the "thread ID returned is the same ID that is returned in the thread parameter to the creating thread at thread creation time", basically implying the thread ID returned by pthread_self stays the same
That's just the HP-UX documentation, not the standard.
http://pubs.opengroup.org/onlinepubs/9699919799/
Anyway, after all this elucubration, LGTM.
OK, I think I found out the cause. I guess this patch was wrong, my bad.
GCC doesn't do anything special with pthread_self per-se.
What GCC does is speculating the glibc implementation of pthread_self is declared with __attribute__(const).
The semantic of the attribute is that of "The const attribute is specified as asserting that the function does not examine any data except the arguments. " [1]
If the function has no arguments, it has to return the same value every time.
Therefore, it speculates.
I think that other libc implementation are free to not declare pthread_self with that attribute. In fact, from what I can see, the FreeBSD version doesn't use that argument.
In other words, I don't think we're allowed to do anything with pthread_self() in general as POSIX specifies weaks guarantees.
[1] https://sourceware.org/ml/libc-alpha/2016-04/msg00303.html
Note: whether it's a good idea to declare pthread_self() with attribute const is a different story, but I'll leave the answer to the glibc developers :)
So there a general implication we can implement: __attribute__(const). + zero arguments == speculatable?
Also, I fail to see how it would not be safe to tread pthread_self as speculatable? Same for getpid.
The standard says that the call always succeeds and always returns the thread id. The thread ids are opaque, and I can imagine there being multiple "self" values (pthread_equal would just return true for all of them), thus making pthread_self non-const. However, I can think of no reason why an implementation would do this, don't know of any that do, the behavior would only be observable via some non-standard interface, and I'm happy to cross that bridge if we come to it.
All of that having been said, however, I think there is something we need to clarify about the semantics. If we allow the transformation:
static pthread_t tid; int main() { tid = pthread_self(); ... } void foo() { auto x =tid; bar(x); ... }
to:
int main() { ... } void foo() { auto x =pthread_self(); bar(x); ... }
this might obviously cause problems (if foo() is called from a different thread than main). This transformation might even be reasonable for 'speculatable' functions that are cheap (as pthread_self should be).
So the semantics we need here are a little less than speculatable, or const, put somehow restricted in scope to function-local transformations.
[1] https://sourceware.org/ml/libc-alpha/2016-04/msg00303.html
I'm not sure if that's the exact recipe, but yes, it seems to be on the right path.
Also, I fail to see how it would not be safe to tread pthread_self as speculatable? Same for getpid.
The standard says that the call always succeeds and always returns the thread id. The thread ids are opaque, and I can imagine there being multiple "self" values (pthread_equal would just return true for all of them), thus making pthread_self non-const. However, I can think of no reason why an implementation would do this, don't know of any that do, the behavior would only be observable via some non-standard interface, and I'm happy to cross that bridge if we come to it.
I don't understand the bit about getpid(). In that case forking actually could change the value and you might end up in trouble if you rely on that to write temporary directories (as it's generally done).
All of that having been said, however, I think there is something we need to clarify about the semantics. If we allow the transformation:
static pthread_t tid; int main() { tid = pthread_self(); ... } void foo() { auto x =tid; bar(x); ... }to:
int main() { ... } void foo() { auto x =pthread_self(); bar(x); ... }this might obviously cause problems (if foo() is called from a different thread than main). This transformation might even be reasonable for 'speculatable' functions that are cheap (as pthread_self should be).
So the semantics we need here are a little less than speculatable, or const, put somehow restricted in scope to function-local transformations.
[1] https://sourceware.org/ml/libc-alpha/2016-04/msg00303.html
Oh. You're right. Also, that seems to also rule out this as well. fork() could also change the value of pthread_self() I'd imagine.
It's slightly different, at least in my opinion.
getpid() returns a pid_t.
pid_t is defined to be an integer type, although POSIX doesn't put any restrictions on the size.
http://pubs.opengroup.org/onlinepubs/009696699/basedefs/sys/types.h.html
glibc (and FreeBSD libc) decide to make it an int. People know it's an integer and use as such.
On the other hand, pthread_t is considered to be an opaque type. It can be an integer/a struct/you name it.
In fact, this completely opaque implementation detail wildly varies across implementations (for LinuxThreads, it's an integer, for NPTL, a pointer).
http://man7.org/linux/man-pages/man3/pthread_self.3.html
Linux also documents that using pthread_t in anything that's not pthread calls results in an unspecified behaviour, while pid_t can be used freely.
So, it's not quite the same, but it's still debatable whether the transformation should be performed or not.
I understand all of this, but the underlying issue is still the same: you can't, in general, move the call past fork(). I see no reason that pthread_equal on the pre-fork and post-fork values would always return true (although this seems to be the case on Linux, even if you fork from a child thread).
Strictly speaking, the glibc attribute violates the specification of const. I.e. we could turn a call for pthread_self() into a once() initialised static variable and that transform would be valid under the const rules. That clearly doesn't reflect the intention...
That said, I do believe it is fully reasonable to make pthread_self() speculatible by default as long as it can be turned off and is properly documented.
I'm not entirely sure about this bit.
IIRC POSIX specifies thread ids to be opaque, see:
so this assumption of it being an integer type might not hold.