This patch modifies code generation in OpenMPIRBuilder to pass arguments to the parallel region outlined function in an aggregate (struct), besides the global_tid and bound_tid arguments. It depends on the updated CodeExtractor (see D96854) for support. It mirrors functionality of Clang codegen (see D102107).
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Comment Actions
If I understand this correctly, filling the aggregate struct is only happening in the parallel case but not for the serialized parallel version. See example below, the call to @sb_..omp_par from kmpc_fork_call in block omp_parallel has the aggregate filled in while the call to @sb_..omp_par in the serialized parallel region in block 6 does not have this. I assume that the extractor creates these and it will only do it at the place that it is called. Would copying over the instructions that fill the aggregate to the serialized parallel region be a reasonable solution?
CC: @Meinersbur
define void @sb_(ptr %0, ptr %1) !dbg !3 { %structArg = alloca { ptr }, align 8 %tid.addr = alloca i32, align 4, !dbg !7 %zero.addr = alloca i32, align 4, !dbg !7 store i32 0, ptr %tid.addr, align 4, !dbg !7 store i32 0, ptr %zero.addr, align 4, !dbg !7 %3 = load i32, ptr %0, align 4, !dbg !7 %4 = icmp ne i32 %3, 0, !dbg !7 br label %entry, !dbg !7 entry: ; preds = %2 %omp_global_thread_num = call i32 @__kmpc_global_thread_num(ptr @1), !dbg !7 br i1 %4, label %5, label %6 5: ; preds = %entry br label %omp_parallel omp_parallel: ; preds = %5 %gep_ = getelementptr { ptr }, ptr %structArg, i32 0, i32 0 store ptr %1, ptr %gep_, align 8 call void (ptr, i32, ptr, ...) @__kmpc_fork_call(ptr @1, i32 1, ptr @sb_..omp_par, ptr %structArg), !dbg !9 br label %omp.par.outlined.exit omp.par.outlined.exit: ; preds = %omp_parallel br label %omp.par.exit.split omp.par.exit.split: ; preds = %omp.par.outlined.exit br label %7 6: ; preds = %entry call void @__kmpc_serialized_parallel(ptr @1, i32 %omp_global_thread_num) call void @sb_..omp_par(ptr %tid.addr, ptr %zero.addr, ptr %structArg), !dbg !9 call void @__kmpc_end_serialized_parallel(ptr @1, i32 %omp_global_thread_num) br label %7 7: ; preds = %6, %omp.par.exit.split ret void, !dbg !10 }