This is an archive of the discontinued LLVM Phabricator instance.

[OMPIRBuilder] Generate aggregate argument for parallel region outlined functions
ClosedPublic

Authored by ggeorgakoudis on Sep 20 2021, 4:45 PM.

Details

Summary

This patch modifies code generation in OpenMPIRBuilder to pass arguments to the parallel region outlined function in an aggregate (struct), besides the global_tid and bound_tid arguments. It depends on the updated CodeExtractor (see D96854) for support. It mirrors functionality of Clang codegen (see D102107).

Diff Detail

Event Timeline

ggeorgakoudis created this revision.Sep 20 2021, 4:45 PM
ggeorgakoudis requested review of this revision.Sep 20 2021, 4:45 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
ggeorgakoudis edited the summary of this revision. (Show Details)Sep 20 2021, 4:50 PM
ggeorgakoudis added a reviewer: jhuber6.
jdoerfert accepted this revision.Oct 7 2021, 10:05 AM

LG. Can be merged once the requirement is in.

This revision is now accepted and ready to land.Oct 7 2021, 10:05 AM
This revision was landed with ongoing or failed builds.Jan 25 2022, 6:25 PM
This revision was automatically updated to reflect the committed changes.

If I understand this correctly, filling the aggregate struct is only happening in the parallel case but not for the serialized parallel version. See example below, the call to @sb_..omp_par from kmpc_fork_call in block omp_parallel has the aggregate filled in while the call to @sb_..omp_par in the serialized parallel region in block 6 does not have this. I assume that the extractor creates these and it will only do it at the place that it is called. Would copying over the instructions that fill the aggregate to the serialized parallel region be a reasonable solution?

CC: @Meinersbur

define void @sb_(ptr %0, ptr %1) !dbg !3 {
  %structArg = alloca { ptr }, align 8
  %tid.addr = alloca i32, align 4, !dbg !7
  %zero.addr = alloca i32, align 4, !dbg !7
  store i32 0, ptr %tid.addr, align 4, !dbg !7
  store i32 0, ptr %zero.addr, align 4, !dbg !7
  %3 = load i32, ptr %0, align 4, !dbg !7
  %4 = icmp ne i32 %3, 0, !dbg !7                 
  br label %entry, !dbg !7

entry:                                            ; preds = %2
  %omp_global_thread_num = call i32 @__kmpc_global_thread_num(ptr @1), !dbg !7
  br i1 %4, label %5, label %6

5:                                                ; preds = %entry
  br label %omp_parallel

omp_parallel:                                     ; preds = %5
  %gep_ = getelementptr { ptr }, ptr %structArg, i32 0, i32 0
  store ptr %1, ptr %gep_, align 8
  call void (ptr, i32, ptr, ...) @__kmpc_fork_call(ptr @1, i32 1, ptr @sb_..omp_par, ptr %structArg), !dbg !9
  br label %omp.par.outlined.exit

omp.par.outlined.exit:                            ; preds = %omp_parallel
  br label %omp.par.exit.split

omp.par.exit.split:                               ; preds = %omp.par.outlined.exit
  br label %7

6:                                                ; preds = %entry
  call void @__kmpc_serialized_parallel(ptr @1, i32 %omp_global_thread_num)
  call void @sb_..omp_par(ptr %tid.addr, ptr %zero.addr, ptr %structArg), !dbg !9
  call void @__kmpc_end_serialized_parallel(ptr @1, i32 %omp_global_thread_num)
  br label %7

7:                                                ; preds = %6, %omp.par.exit.split
  ret void, !dbg !10
}
Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2022, 2:36 AM