This is an archive of the discontinued LLVM Phabricator instance.

Changed omp_get_max_threads() implementation to more closely match spec description.
ClosedPublic

Authored by estewart08 on Feb 5 2020, 2:14 PM.

Details

Summary

The 5.0 spec states, "The omp_get_max_threads routine returns an upper bound on the number of threads that could be used to form a new team if a parallel construct without a num_threads clause were encountered after execution returns from this routine." The attached test shows Max Threads: 96, Num Threads: 128 without the proposed change. The number of threads should not exceed the (max) nthreads ICV, hence we should return the higher SPMD thread number even when omp_get_max_threads() is called in a generic kernel. This change does fail the api test, max_threads.c, because now it would return 64 instead of 32.

Diff Detail

Event Timeline

estewart08 created this revision.Feb 5 2020, 2:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 5 2020, 2:14 PM
estewart08 retitled this revision from Changed omp_get_max_threads() implementation to more closely match spec description: "The omp_get_max_threads routine returns an upper bound on the number of threads that could be used to form a new team if a parallel construct without a... to Changed omp_get_max_threads() implementation to more closely match spec description..Feb 5 2020, 2:30 PM
estewart08 edited the summary of this revision. (Show Details)

This change does fail the api test, max_threads.c, because now it would return 64 instead of 32.

Can you please adjust the test and make it part of the commit.

The reasoning looks OK to me and I like the associated test. I don't understand the behaviour change in max_threads though, it looks like:

#pragma omp target teams map(MaxThreadsL1, MaxThreadsL2) thread_limit(32)      \
    num_teams(1)
  {
    MaxThreadsL1 = omp_get_max_threads();
// ...
  }

  // CHECK: Non-SPMD MaxThreadsL1 = 32                                                                                                                                                                             
  printf("Non-SPMD MaxThreadsL1 = %d\n", MaxThreadsL1);

Which I think is the developer asking for a maximum number of threads of 32 - do you mean both instances of 32 become 64, or just the second one? Agreed that it should be part of this patch, otherwise we'd break the build by committing.

Further, the get_max_threads result might be architecture dependent. Do nvptx and amdgcn return the same value in equivalent contexts? We might need an #ifdef in the test, or a separate instance of the test under amdgcn if they do behave differently here.

I can definitely add the change to max_threads.c to this review. The CHECK would become 64 due to the fact we are counting all threads now with this proposed change 32 thread_limit + 32 master warp.

// CHECK: Non-SPMD MaxThreadsL1 = 64

Yes, the test I proposed would be for nvptx only due to the fact that the other tests reside in the nvptx directory and the original max_threads test was checking nvptx values as well. Is the plan to convert all tests so that they support different architectures in the future and move them to common?

estewart08 updated this revision to Diff 242931.Feb 6 2020, 9:41 AM
  • Update max_threads.c api test to match the change for omp_get_max_threads().
JonChesterfield added a comment.EditedFeb 6 2020, 3:32 PM

Change looks good to me. @jdoerfert, @ABataev, @grokos?

Is the plan to convert all tests so that they support different architectures in the future and move them to common?

That sounds reasonable, though I'm not sure it's established as a plan. A fair amount of (to be added) tests should be totally architecture agnostic so those should end up under common. And some will always be architecture dependent.

jdoerfert accepted this revision.Feb 7 2020, 12:24 AM

This is a bug fix that is correct and helpful, one minor nit inlined but other than that LGTM.

Change looks good to me. @jdoerfert, @ABataev, @grokos?

Is the plan to convert all tests so that they support different architectures in the future and move them to common?

That sounds reasonable, though I'm not sure it's established as a plan. A fair amount of (to be added) tests should be totally architecture agnostic so those should end up under common. And some will always be architecture dependent.

+1, yes most should be independent and only if we have to we go dependent.

I can definitely add the change to max_threads.c to this review. The CHECK would become 64 due to the fact we are counting all threads now with this proposed change 32 thread_limit + 32 master warp.

// CHECK: Non-SPMD MaxThreadsL1 = 64

Yes, the test I proposed would be for nvptx only due to the fact that the other tests reside in the nvptx directory and the original max_threads test was checking nvptx values as well. Is the plan to convert all tests so that they support different architectures in the future and move them to common?

  1. The entire 32 + "32 master warp" is a problematic construction we should get rid of. It's an implementation detail that leaks out and confuses people.
  2. If we have different "warp sizes" we can have multiple check prefixes, especially since we will have to generalize compile-run-and-check anyway.
openmp/libomptarget/deviceRTLs/nvptx/test/api/max_threads.c
22–29

Nit: Please add a "Fixme" comment here explaining why 32, or actually "WARP_SIZE" would be the right thing here but why we see 62 instead.

This revision is now accepted and ready to land.Feb 7 2020, 12:24 AM
estewart08 updated this revision to Diff 243299.Feb 7 2020, 3:12 PM
  • Added FIXME comment to describe change in omp_get_max_threads behavior.

Thanks. Do you have commit access to the llvm github, and if not, would you prefer to wait until that is granted or have someone else land this on your behalf?

I've landed this on Ethan's behalf as I believe he's distracted by non-llvm activities at present.

This revision was automatically updated to reflect the committed changes.