This patch modifies the default target kernel launch parameters (num_teams and thread_limit). The default thread_limit is set to 128 threads per team. In SPMD mode the kernel is launched with 128 threads, in non-SPMD mode we use 96 threads (+32 of the master warp).
The default number of teams has been optimized as follows. For the constructs below:
#target teams distribute
#teams distribute
#target teams distribute simd
#teams distribute simd
if the associated loop trip count is N, then the kernel is launched with N teams.