This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Fix nextgen plugin behavior when passing negative thread_limit clause
Needs RevisionPublic

Authored by mhalk on Apr 4 2023, 2:14 AM.

Details

Summary

ThreadLimitClause is declared as uint32_t, but could be provided as negative value.
In that case we expect the result of getNumThreads to be PreferredNumThreads -- not MaxNumThreads.
Added testcase 'negative_thread_limit' for verification.

Diff Detail

Event Timeline

mhalk created this revision.Apr 4 2023, 2:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 4 2023, 2:14 AM
mhalk requested review of this revision.Apr 4 2023, 2:14 AM
Herald added a project: Restricted Project. · View Herald Transcript
ronlieb added a subscriber: ronlieb.

@kevinsala to review

This patch is to support setting thread_limit clause to -1? Or is it for another case? If I'm not wrong, the OpenMP standard specifies that the value for thread_limit clause should be positive.

mhalk added a comment.Apr 4 2023, 4:01 AM

As far as I know: No, this patch should avoid exactly that case. (It seems that param would be set to -1 when there's no thread_limit clause, which caused problems.)

mhalk updated this revision to Diff 511895.EditedApr 8 2023, 8:51 AM

Refactored the fix such that it covers not only the '-1' case, but all negative integers.
Added a testcase, also to demonstrate the issue.

edit: for reference -- the mentioned values (like GV_Default_WG_Size) can be found here
OMPGridValues.h

mhalk retitled this revision from [OpenMP] Fix nextgen plugin thread_limit clause bug on generic kernels. to [OpenMP] Fix nextgen plugin thread_limit clause bug when passing negative values..Apr 11 2023, 3:09 AM
mhalk edited the summary of this revision. (Show Details)
jdoerfert added inline comments.Apr 11 2023, 9:36 AM
openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp
305

I don't understand from the commit message what "the problem" was.
This should be min(MNT, TLC[0] or PNT), no? This should never be negative, even if we interpret it signed later. Is the problem that we pick MNT and not PNT?

Nit: We should check for <s0 with a cast and comparison, I think that makes the intend clearer.

mhalk added inline comments.Apr 12 2023, 3:17 AM
openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp
305

Thank you for the feedback -- didn't want to obscure the problem or intent; I'll fix these.

Is the problem that we pick MNT and not PNT?

Yes, AFAIK we expected to fallback / pick PNT in such a case.
Was that a wrong assumption?

mhalk updated this revision to Diff 512769.Apr 12 2023, 4:10 AM

Rebase + clarifications on this patch's intent

mhalk retitled this revision from [OpenMP] Fix nextgen plugin thread_limit clause bug when passing negative values. to [OpenMP] Fix nextgen plugin behavior when passing negative thread_limit clause.Apr 12 2023, 4:11 AM
mhalk edited the summary of this revision. (Show Details)
mhalk added a comment.Apr 19 2023, 5:57 AM

Feedback should be addressed from my perspective -- is there anything left to do?
Thanks in advance!

Apologies for the delay.

So, this is good and bad.

The bas part is that it actually doesn't matter if the value is "negative". The standard, at least as far as I can remember, doesn't specify if the value is supposed to be 32 bit signed or something else. Hence, UINT_MAX as thread limit should just be UINT_MAX and not INT_MIN which we handle differently than UINT_MAX/2.

The good part is that we can actually lower this upper bound, at least if it was not forced (which 5.2 or newer allow, IIRC). Thus, we can lower an unrealistic thread limit to something more meaningful if we want to. However, I don't see why we should make the cutoff at "signed bit set" and not earlier.

Overall, I'd like more reasoning for this threshold so we can potentially come up with a more meaningful way of determining the number of threads, e.g., by looking at the resources used.

As far as I know: No, this patch should avoid exactly that case. (It seems that param would be set to -1 when there's no thread_limit clause, which caused problems.)

I didn't see that is the case. https://godbolt.org/z/TdcfxfEej I didn't set the thread limit and then it is 0.
The spec says thread_limit has to be greater than zero. So here using 0 to indicate the clause is not set looks good to me.

I didn't see that is the case. https://godbolt.org/z/TdcfxfEej I didn't set the thread limit and then it is 0.
The spec says thread_limit has to be greater than zero. So here using 0 to indicate the clause is not set looks good to me.

Sorry, there was some initial confusion: after some investigation it turned out that the negative thread_limit came from "outside", which is also why the phrasing of the issue has changed so much.
But as far as I was informed by the reporter of this issue, the result of getNumThreads was expected to be PreferredNumThreads -- instead, when deliberately setting the thread_limit to a negative value, the result is MaxNumThreads (as illustrated in the test).

Apologies for the delay.

So, this is good and bad.

The bas part is that it actually doesn't matter if the value is "negative". The standard, at least as far as I can remember, doesn't specify if the value is supposed to be 32 bit signed or something else. Hence, UINT_MAX as thread limit should just be UINT_MAX and not INT_MIN which we handle differently than UINT_MAX/2.

The good part is that we can actually lower this upper bound, at least if it was not forced (which 5.2 or newer allow, IIRC). Thus, we can lower an unrealistic thread limit to something more meaningful if we want to. However, I don't see why we should make the cutoff at "signed bit set" and not earlier.

Overall, I'd like more reasoning for this threshold so we can potentially come up with a more meaningful way of determining the number of threads, e.g., by looking at the resources used.

No worries!
I like the idea of further lowering the threshold to a "more meaningful" value or determine the return value based on some form of evidence.
I'm primarily curious: "looking at the resources used" -- if you find the time, could you elaborate on that?

gregrodgers accepted this revision.Apr 19 2023, 1:24 PM
gregrodgers added a subscriber: gregrodgers.
gregrodgers added inline comments.
openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp
305

Yes, spec says it is wrong but spec also says field is int which could have a variable that went incorrectly negative by clumsy user. Suppose it is -1, the current code adds 64 (warpsize) to the uint and get 63. YUCK! Since a negative value is wrong we prefer to treat a negative value at this low in the runtime as unspecified (clause not present) which is 0. To be honest, we are hitting this problem with LLVM IR generated by the legacy fortran which inserted a -1 when there is no thread_limit clause. Yeah we should fix legacy fortran and ensure LLVM flang treats the lack of clause similarly. But we were able to reproduce the fail with a variable in the thread_limit clause that is made to go negative by a clumsy user.

This revision is now accepted and ready to land.Apr 19 2023, 1:24 PM
jdoerfert requested changes to this revision.Apr 19 2023, 4:48 PM
jdoerfert added inline comments.
openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp
305

But you don't treat it as 0. That's my point. This patch will use less threads for thread limit 2^31-1 spmd than 2^31 spmd. That doesn't make sense, IMHO.

Let's fix the issue properly, if the additional 64 causes an overflow, don't add them, we will cap the value anyway.

You still want to fix legacy flang if you want preferred number not max number. That said, i don't see why we need a workaround for legacy flang here.

This revision now requires changes to proceed.Apr 19 2023, 4:48 PM

Just remove this part to make it consistent while it will still avoid the overflow:

if (static_cast<int32_t>(ThreadLimitClause[0]) <= 0)
   ThreadLimitClause[0] = PreferredNumThreads;
 else