The 5.0 spec states, "The omp_get_max_threads routine returns an upper bound on the number of threads that could be used to form a new team if a parallel construct without a num_threads clause were encountered after execution returns from this routine." The attached test shows Max Threads: 96, Num Threads: 128 without the proposed change. The number of threads should not exceed the (max) nthreads ICV, hence we should return the higher SPMD thread number even when omp_get_max_threads() is called in a generic kernel. This change does fail the api test, max_threads.c, because now it would return 64 instead of 32.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
- Build Status
Buildable 45812 Build 47897: arc lint + arc unit
Event Timeline
This change does fail the api test, max_threads.c, because now it would return 64 instead of 32.
Can you please adjust the test and make it part of the commit.
The reasoning looks OK to me and I like the associated test. I don't understand the behaviour change in max_threads though, it looks like:
#pragma omp target teams map(MaxThreadsL1, MaxThreadsL2) thread_limit(32) \ num_teams(1) { MaxThreadsL1 = omp_get_max_threads(); // ... } // CHECK: Non-SPMD MaxThreadsL1 = 32 printf("Non-SPMD MaxThreadsL1 = %d\n", MaxThreadsL1);
Which I think is the developer asking for a maximum number of threads of 32 - do you mean both instances of 32 become 64, or just the second one? Agreed that it should be part of this patch, otherwise we'd break the build by committing.
Further, the get_max_threads result might be architecture dependent. Do nvptx and amdgcn return the same value in equivalent contexts? We might need an #ifdef in the test, or a separate instance of the test under amdgcn if they do behave differently here.
I can definitely add the change to max_threads.c to this review. The CHECK would become 64 due to the fact we are counting all threads now with this proposed change 32 thread_limit + 32 master warp.
// CHECK: Non-SPMD MaxThreadsL1 = 64
Yes, the test I proposed would be for nvptx only due to the fact that the other tests reside in the nvptx directory and the original max_threads test was checking nvptx values as well. Is the plan to convert all tests so that they support different architectures in the future and move them to common?
Change looks good to me. @jdoerfert, @ABataev, @grokos?
Is the plan to convert all tests so that they support different architectures in the future and move them to common?
That sounds reasonable, though I'm not sure it's established as a plan. A fair amount of (to be added) tests should be totally architecture agnostic so those should end up under common. And some will always be architecture dependent.
This is a bug fix that is correct and helpful, one minor nit inlined but other than that LGTM.
+1, yes most should be independent and only if we have to we go dependent.
- The entire 32 + "32 master warp" is a problematic construction we should get rid of. It's an implementation detail that leaks out and confuses people.
- If we have different "warp sizes" we can have multiple check prefixes, especially since we will have to generalize compile-run-and-check anyway.
openmp/libomptarget/deviceRTLs/nvptx/test/api/max_threads.c | ||
---|---|---|
22 ↗ | (On Diff #242931) | Nit: Please add a "Fixme" comment here explaining why 32, or actually "WARP_SIZE" would be the right thing here but why we see 62 instead. |
Thanks. Do you have commit access to the llvm github, and if not, would you prefer to wait until that is granted or have someone else land this on your behalf?
I've landed this on Ethan's behalf as I believe he's distracted by non-llvm activities at present.