This is an archive of the discontinued LLVM Phabricator instance.

Do not always request an implicit taskgroup region inside the kmpc_taskloop function
ClosedPublic

Authored by smateo on Oct 24 2018, 3:37 AM.

Details

Summary

For the following code:

int i;
#pragma omp taskloop
for (i = 0; i < 100; ++i)
{}

#pragma omp taskloop nogroup
for (i = 0; i < 100; ++i)
{}

Clang emits the following LLVM IR:

...
 call void @__kmpc_taskgroup(%struct.ident_t* @0, i32 %0)
 %2 = call i8* @__kmpc_omp_task_alloc(%struct.ident_t* @0, i32 %0, i32 1, i64 80, i64 8, i32 (i32, i8*)* bitcast (i32 (i32, %struct.kmp_task_t_with_privates*)* @.omp_task_entry. to i32 (i32, i8*)*))
 ...
 call void @__kmpc_taskloop(%struct.ident_t* @0, i32 %0, i8* %2, i32 1, i64* %8, i64* %9, i64 %13, i32 0, i32 0, i64 0, i8* null)
 call void @__kmpc_end_taskgroup(%struct.ident_t* @0, i32 %0)


 ...
 %15 = call i8* @__kmpc_omp_task_alloc(%struct.ident_t* @0, i32 %0, i32 1, i64 80, i64 8, i32 (i32, i8*)* bitcast (i32 (i32, %struct.kmp_task_t_with_privates.1*)* @.omp_task_entry..2 to i32 (i32, i8*)*))
 ...
 call void @__kmpc_taskloop(%struct.ident_t* @0, i32 %0, i8* %15, i32 1, i64* %21, i64* %22, i64 %26, i32 0, i32 0, i64 0, i8* null)

The first set of instructions corresponds to the first taskloop construct. It is important to note that the implicit taskgroup region associated with the taskloop construct has been materialized in our IR: the __kmpc_taskloop occurs inside a taskgroup region. Note also that this taskgroup region does not exist in our second taskloop because we are using the nogroup clause.

The issue here is the 4th argument of the kmpc_taskloop call, starting from the end, is always a zero. Checking the LLVM OpenMP RT implementation, we see that this argument corresponds to the nogroup parameter:

void __kmpc_taskloop(ident_t *loc, int gtid, kmp_task_t *task, int if_val,
                     kmp_uint64 *lb, kmp_uint64 *ub, kmp_int64 st, int nogroup,
                     int sched, kmp_uint64 grainsize, void *task_dup);

So basically we always tell to the RT to do another taskgroup region. For the first taskloop, this means that we create two taskgroup regions. For the second example, it means that despite the fact we had a nogroup clause we are going to have a taskgroup region, so we unnecessary wait until all descendant tasks have been executed.

Diff Detail

Repository
rC Clang

Event Timeline

smateo created this revision.Oct 24 2018, 3:37 AM
This revision is now accepted and ready to land.Oct 24 2018, 7:37 AM

Hi Alexey,

Thanks for the prompt review!

I don't have commit access yet, do you mind to commit it for me?

Thanks!

Hi Alexey,

Thanks for the prompt review!

I don't have commit access yet, do you mind to commit it for me?

Thanks!

Sure, no problems, thanks for the patch!

This revision was automatically updated to reflect the committed changes.