ThreadLimitClause is declared as uint32_t, but could be provided as negative value.
In that case we expect the result of getNumThreads to be PreferredNumThreads -- not MaxNumThreads.
Added testcase 'negative_thread_limit' for verification.
Details
- Reviewers
jdoerfert gregrodgers
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This patch is to support setting thread_limit clause to -1? Or is it for another case? If I'm not wrong, the OpenMP standard specifies that the value for thread_limit clause should be positive.
As far as I know: No, this patch should avoid exactly that case. (It seems that param would be set to -1 when there's no thread_limit clause, which caused problems.)
Refactored the fix such that it covers not only the '-1' case, but all negative integers.
Added a testcase, also to demonstrate the issue.
edit: for reference -- the mentioned values (like GV_Default_WG_Size) can be found here
OMPGridValues.h
openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp | ||
---|---|---|
305 | I don't understand from the commit message what "the problem" was. Nit: We should check for <s0 with a cast and comparison, I think that makes the intend clearer. |
openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp | ||
---|---|---|
305 | Thank you for the feedback -- didn't want to obscure the problem or intent; I'll fix these.
Yes, AFAIK we expected to fallback / pick PNT in such a case. |
Feedback should be addressed from my perspective -- is there anything left to do?
Thanks in advance!
Apologies for the delay.
So, this is good and bad.
The bas part is that it actually doesn't matter if the value is "negative". The standard, at least as far as I can remember, doesn't specify if the value is supposed to be 32 bit signed or something else. Hence, UINT_MAX as thread limit should just be UINT_MAX and not INT_MIN which we handle differently than UINT_MAX/2.
The good part is that we can actually lower this upper bound, at least if it was not forced (which 5.2 or newer allow, IIRC). Thus, we can lower an unrealistic thread limit to something more meaningful if we want to. However, I don't see why we should make the cutoff at "signed bit set" and not earlier.
Overall, I'd like more reasoning for this threshold so we can potentially come up with a more meaningful way of determining the number of threads, e.g., by looking at the resources used.
I didn't see that is the case. https://godbolt.org/z/TdcfxfEej I didn't set the thread limit and then it is 0.
The spec says thread_limit has to be greater than zero. So here using 0 to indicate the clause is not set looks good to me.
Sorry, there was some initial confusion: after some investigation it turned out that the negative thread_limit came from "outside", which is also why the phrasing of the issue has changed so much.
But as far as I was informed by the reporter of this issue, the result of getNumThreads was expected to be PreferredNumThreads -- instead, when deliberately setting the thread_limit to a negative value, the result is MaxNumThreads (as illustrated in the test).
No worries!
I like the idea of further lowering the threshold to a "more meaningful" value or determine the return value based on some form of evidence.
I'm primarily curious: "looking at the resources used" -- if you find the time, could you elaborate on that?
openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp | ||
---|---|---|
305 | Yes, spec says it is wrong but spec also says field is int which could have a variable that went incorrectly negative by clumsy user. Suppose it is -1, the current code adds 64 (warpsize) to the uint and get 63. YUCK! Since a negative value is wrong we prefer to treat a negative value at this low in the runtime as unspecified (clause not present) which is 0. To be honest, we are hitting this problem with LLVM IR generated by the legacy fortran which inserted a -1 when there is no thread_limit clause. Yeah we should fix legacy fortran and ensure LLVM flang treats the lack of clause similarly. But we were able to reproduce the fail with a variable in the thread_limit clause that is made to go negative by a clumsy user. |
openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp | ||
---|---|---|
305 | But you don't treat it as 0. That's my point. This patch will use less threads for thread limit 2^31-1 spmd than 2^31 spmd. That doesn't make sense, IMHO. Let's fix the issue properly, if the additional 64 causes an overflow, don't add them, we will cap the value anyway. You still want to fix legacy flang if you want preferred number not max number. That said, i don't see why we need a workaround for legacy flang here. |
Just remove this part to make it consistent while it will still avoid the overflow:
if (static_cast<int32_t>(ThreadLimitClause[0]) <= 0) ThreadLimitClause[0] = PreferredNumThreads; else
I don't understand from the commit message what "the problem" was.
This should be min(MNT, TLC[0] or PNT), no? This should never be negative, even if we interpret it signed later. Is the problem that we pick MNT and not PNT?
Nit: We should check for <s0 with a cast and comparison, I think that makes the intend clearer.