This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
1/9
OpenMPOpt.cpp
-
test/Transforms/OpenMP/
-
Transforms/
-
OpenMP/
-
hide_mem_transfer_latency.ll

Differential D86155

[OpenMPOpt][SplitMemTransfer] Moving the "wait" down
ClosedPublic

Authored by hamax97 on Aug 18 2020, 10:33 AM.

Download Raw Diff

Details

Reviewers

jdoerfert
sstefan1

Commits

rGbd2fa1819b9d: [OpenMPOpt][HideMemTransfersLatency] Moving the 'wait' counterpart of…

Summary

canBeMovedDownwards checks if the "wait" counterpart of the runtime call can be moved downwards, returning a pointer to the instruction that might require/modify the data transferred, and returning null it the movement is not possible or not worth it. The function splitTargetDataBeginRTC receives that returned instruction and instead of moving the "wait" it creates it at that point.

Diff Detail

Event Timeline

hamax97 created this revision.Aug 18 2020, 10:33 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 18 2020, 10:33 AM

Herald added subscribers: llvm-commits, guansong, hiraditya, yaxunl. · View Herald Transcript

hamax97 requested review of this revision.Aug 18 2020, 10:33 AM

We have to check "mayReadMemory" as well I think.

In D86155#2224292, @jdoerfert wrote:

We have to check "mayReadMemory" as well I think.

Is it really necessary?, I think we only care if the instruction writes to memory. If I add it, now only in heaveComputation1 and dataTransferOnly1 the runtime call is split. This because the immediate instructions after __tgt_target_data_begin_mapper in the other test functions is a load.

In D86155#2224474, @hamax97 wrote:

In D86155#2224292, @jdoerfert wrote:

We have to check "mayReadMemory" as well I think.

Is it really necessary?, I think we only care if the instruction writes to memory. If I add it, now only in heaveComputation1 and dataTransferOnly1 the runtime call is split. This because the immediate instructions after __tgt_target_data_begin_mapper in the other test functions is a load.

Not splitting in this example is OK. Load is a problem if the transfer is to the issuing device, store is always a problem.

Adding constraint mayReadFromMemory to canBeMovedDownwards.

Harbormaster completed remote builds in B68796: Diff 286383.Aug 18 2020, 1:59 PM

jdoerfert added inline comments.Aug 18 2020, 2:12 PM

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
719	Can't you just check if CurrentI is RuntimeCall?
737	You cannot look for the target calls. What if you encounter a call to a function that contains such a call? However, as long as we don't allow side effects this should not be an issue, just remove this part. You need to check if it is a call that it has a `nosync` attribute though.
746	Or something else that looks less confusing.
752	Style: Just make them references instead of pointers.

hamax97 added inline comments.Aug 18 2020, 4:16 PM

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
719	I think it's never RuntimeCall. It starts with the instruction after RuntimeCall.

jdoerfert added inline comments.Aug 18 2020, 4:21 PM

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
719	OK, keep it.

Removing comparison with explicit runtime functions from canBeMovedDownwards.
Minor refactors to comments and arguments.

LGTM

This revision is now accepted and ready to land.Aug 18 2020, 6:40 PM

sstefan1 added inline comments.Aug 19 2020, 12:28 AM

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
717	nit: alias
723	typo: alias

sstefan1 added inline comments.Aug 19 2020, 12:30 AM

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
723	ignore this one, sorry

Closed by commit rGbd2fa1819b9d: [OpenMPOpt][HideMemTransfersLatency] Moving the 'wait' counterpart of… (authored by hamax97). · Explain WhyAug 19 2020, 9:43 AM

This revision was automatically updated to reflect the committed changes.

Hamilton Tobon Mosquera <htobonmm7@gmail.com> added a commit: rGbd2fa1819b9d: [OpenMPOpt][HideMemTransfersLatency] Moving the 'wait' counterpart of….

Revision Contents

Path

Size

llvm/

lib/

Transforms/

IPO/

OpenMPOpt.cpp

59 lines

test/

Transforms/

OpenMP/

hide_mem_transfer_latency.ll

16 lines

Diff 286340

llvm/lib/Transforms/IPO/OpenMPOpt.cpp

Show First 20 Lines • Show All 683 Lines • ▼ Show 20 Lines private:

bool hideMemTransfersLatency() { bool hideMemTransfersLatency() {

auto &RFI = OMPInfoCache.RFIs[OMPRTL___tgt_target_data_begin_mapper]; auto &RFI = OMPInfoCache.RFIs[OMPRTL___tgt_target_data_begin_mapper];

bool Changed = false; bool Changed = false;

auto SplitMemTransfers = [&](Use &U, Function &Decl) { auto SplitMemTransfers = [&](Use &U, Function &Decl) {

auto *RTCall = getCallIfRegularCall(U, &RFI); auto *RTCall = getCallIfRegularCall(U, &RFI);

if (!RTCall) if (!RTCall)

return false; return false;

bool WasSplit = splitTargetDataBeginRTC(RTCall); // TODO: Check if can be moved upwards.

bool WasSplit = false;

Instruction *WaitMovementPoint = canBeMovedDownwards(RTCall);

if (WaitMovementPoint)

WasSplit = splitTargetDataBeginRTC(RTCall, WaitMovementPoint);

Changed |= WasSplit; Changed |= WasSplit;

return WasSplit; return WasSplit;

}; };

RFI.foreachUse(SCC, SplitMemTransfers); RFI.foreachUse(SCC, SplitMemTransfers);

return Changed; return Changed;

} }

/// Returns the instruction where the "wait" counterpart \p RuntimeCall can be

/// moved. Returns nullptr if the movement is not possible, or not worth it.

Instruction *canBeMovedDownwards(CallInst *RuntimeCall) {

// FIXME: This traverses only the BasicBlock where RuntimeCall is.

// Make it traverse the CFG.

// Functions that may require the data transferred or may synchronize it.

auto *TargetTeams = OMPInfoCache.OMPBuilder.getOrCreateRuntimeFunctionPtr(

llvm::omp::RuntimeFunction::OMPRTL___tgt_target_teams_mapper);

auto *TargetDataEnd = OMPInfoCache.OMPBuilder.getOrCreateRuntimeFunctionPtr(

llvm::omp::RuntimeFunction::OMPRTL___tgt_target_data_end_mapper);

sstefan1Unsubmitted

Not Done

nit: alias

sstefan1: nit: alias

Instruction *CurrentI = RuntimeCall;

bool IsWorthIt = false;

jdoerfertUnsubmitted

Not Done

Can't you just check if CurrentI is RuntimeCall?

jdoerfert: Can't you just check if CurrentI is RuntimeCall?

hamax97AuthorUnsubmitted

Done

I think it's never RuntimeCall. It starts with the instruction after RuntimeCall.

hamax97: I think it's never RuntimeCall. It starts with the instruction after RuntimeCall.

jdoerfertUnsubmitted

Not Done

OK, keep it.

jdoerfert: OK, keep it.

while ((CurrentI = CurrentI->getNextNode())) {

// TODO: Once we detect the regions to be offloaded we should use the

// alyas analysis manager to check if CurrentI may modify one of

sstefan1Unsubmitted

Not Done

typo: alias

sstefan1: typo: alias

sstefan1Unsubmitted

Not Done

ignore this one, sorry

sstefan1: ignore this one, sorry

// the offloaded regions.

if (CurrentI->mayHaveSideEffects()) {

if (IsWorthIt)

return CurrentI;

return nullptr;

}

if (auto *C = dyn_cast<CallInst>(CurrentI)) {

auto *Callee = C->getCalledFunction();

if (Callee == TargetTeams || Callee == TargetDataEnd) {

if (IsWorthIt)

return CurrentI;

return nullptr;

}

jdoerfertUnsubmitted

Not Done

You cannot look for the target calls. What if you encounter a call to a function that contains such a call? However, as long as we don't allow side effects this should not be an issue, just remove this part.

You need to check if it is a call that it has a nosync attribute though.

jdoerfert: You cannot look for the target calls. What if you encounter a call to a function that contains…

}

// FIXME: For now if we move it over anything without side effect

// is worth it.

IsWorthIt = true;

}

// Return end of BasicBlock.

return &*(--RuntimeCall->getParent()->end());

jdoerfertUnsubmitted

Not Done

// Return end of BasicBlock.

- return &*(--RuntimeCall->getParent()->end());

+ return RuntimeCall->getParent()->getTerminator()->getPrevNode();

}

/// Splits \p RuntimeCall into its "issue" and "wait" counterparts.

Or something else that looks less confusing.

jdoerfert: Or something else that looks less confusing.

}

/// Splits \p RuntimeCall into its "issue" and "wait" counterparts. /// Splits \p RuntimeCall into its "issue" and "wait" counterparts.

bool splitTargetDataBeginRTC(CallInst *RuntimeCall) { bool splitTargetDataBeginRTC(CallInst *RuntimeCall,

Instruction *WaitMovementPoint) {

assert(WaitMovementPoint && "No place to move the split runtime call!");

jdoerfertUnsubmitted

Not Done

Style: Just make them references instead of pointers.

jdoerfert: Style: Just make them references instead of pointers.

auto &IRBuilder = OMPInfoCache.OMPBuilder; auto &IRBuilder = OMPInfoCache.OMPBuilder;

// Add "issue" runtime call declaration: // Add "issue" runtime call declaration:

// declare %struct.tgt_async_info @__tgt_target_data_begin_issue(i64, i32, // declare %struct.tgt_async_info @__tgt_target_data_begin_issue(i64, i32,

// i8**, i8**, i64*, i64*) // i8**, i8**, i64*, i64*)

FunctionCallee IssueDecl = IRBuilder.getOrCreateRuntimeFunction( FunctionCallee IssueDecl = IRBuilder.getOrCreateRuntimeFunction(

M, OMPRTL___tgt_target_data_begin_mapper_issue); M, OMPRTL___tgt_target_data_begin_mapper_issue);

// Change RuntimeCall call site for its asynchronous version. // Change RuntimeCall call site for its asynchronous version.

Show All 10 Lines bool splitTargetDataBeginRTC(CallInst *RuntimeCall,

FunctionCallee WaitDecl = IRBuilder.getOrCreateRuntimeFunction( FunctionCallee WaitDecl = IRBuilder.getOrCreateRuntimeFunction(

M, OMPRTL___tgt_target_data_begin_mapper_wait); M, OMPRTL___tgt_target_data_begin_mapper_wait);

// Add call site to WaitDecl. // Add call site to WaitDecl.

Value *WaitParams[2] = { Value *WaitParams[2] = {

IssueCallsite->getArgOperand(0), // device_id. IssueCallsite->getArgOperand(0), // device_id.

IssueCallsite // returned handle. IssueCallsite // returned handle.

}; };

CallInst::Create(WaitDecl, WaitParams, /*NameStr=*/"", CallInst::Create(

IssueCallsite->getNextNode()); WaitDecl, WaitParams, /*NameStr=*/"", WaitMovementPoint);

return true; return true;

} }

static Value *combinedIdentStruct(Value *CurrentIdent, Value *NextIdent, static Value *combinedIdentStruct(Value *CurrentIdent, Value *NextIdent,

bool GlobalOnly, bool &SingleChoice) { bool GlobalOnly, bool &SingleChoice) {

if (CurrentIdent == NextIdent) if (CurrentIdent == NextIdent)

return CurrentIdent; return CurrentIdent;

▲ Show 20 Lines • Show All 857 Lines • Show Last 20 Lines

llvm/test/Transforms/OpenMP/hide_mem_transfer_latency.ll

	Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: %1 = getelementptr inbounds [1 x i8], [1 x i8]* %.offload_baseptrs, i64 0, i64 0			; CHECK-NEXT: %1 = getelementptr inbounds [1 x i8], [1 x i8]* %.offload_baseptrs, i64 0, i64 0
	; CHECK-NEXT: %2 = bitcast [1 x i8] %.offload_baseptrs to double**			; CHECK-NEXT: %2 = bitcast [1 x i8] %.offload_baseptrs to double**
	; CHECK-NEXT: store double* %a, double** %2, align 8			; CHECK-NEXT: store double* %a, double** %2, align 8
	; CHECK-NEXT: %3 = getelementptr inbounds [1 x i8], [1 x i8]* %.offload_ptrs, i64 0, i64 0			; CHECK-NEXT: %3 = getelementptr inbounds [1 x i8], [1 x i8]* %.offload_ptrs, i64 0, i64 0
	; CHECK-NEXT: %4 = bitcast [1 x i8] %.offload_ptrs to double**			; CHECK-NEXT: %4 = bitcast [1 x i8] %.offload_ptrs to double**
	; CHECK-NEXT: store double* %a, double** %4, align 8			; CHECK-NEXT: store double* %a, double** %4, align 8

	; CHECK-NEXT: %handle = call %struct.__tgt_async_info @__tgt_target_data_begin_mapper_issue(i64 -1, i32 1, i8 %1, i8 %3, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @.offload_sizes.1, i64 0, i64 0), i64* getelementptr inbounds ([1 x i64], [1 x i64]* @.offload_maptypes, i64 0, i64 0), i8** null)			; CHECK-NEXT: %handle = call %struct.__tgt_async_info @__tgt_target_data_begin_mapper_issue(i64 -1, i32 1, i8 %1, i8 %3, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @.offload_sizes.1, i64 0, i64 0), i64* getelementptr inbounds ([1 x i64], [1 x i64]* @.offload_maptypes, i64 0, i64 0), i8** null)
	; CHECK-NEXT: call void @__tgt_target_data_begin_mapper_wait(i64 -1, %struct.__tgt_async_info %handle)

	; CHECK-NEXT: %5 = bitcast double* %a to i64*			; CHECK-NEXT: %5 = bitcast double* %a to i64*
	; CHECK-NEXT: %6 = load i64, i64* %5, align 8			; CHECK-NEXT: %6 = load i64, i64* %5, align 8
	; CHECK-NEXT: %7 = getelementptr inbounds [1 x i8], [1 x i8]* %.offload_baseptrs4, i64 0, i64 0			; CHECK-NEXT: %7 = getelementptr inbounds [1 x i8], [1 x i8]* %.offload_baseptrs4, i64 0, i64 0
	; CHECK-NEXT: %8 = bitcast [1 x i8] %.offload_baseptrs4 to i64*			; CHECK-NEXT: %8 = bitcast [1 x i8] %.offload_baseptrs4 to i64*

				; CHECK-NEXT: call void @__tgt_target_data_begin_mapper_wait(i64 -1, %struct.__tgt_async_info %handle)

	; CHECK-NEXT: store i64 %6, i64* %8, align 8			; CHECK-NEXT: store i64 %6, i64* %8, align 8
	; CHECK-NEXT: %9 = getelementptr inbounds [1 x i8], [1 x i8]* %.offload_ptrs5, i64 0, i64 0			; CHECK-NEXT: %9 = getelementptr inbounds [1 x i8], [1 x i8]* %.offload_ptrs5, i64 0, i64 0
	; CHECK-NEXT: %10 = bitcast [1 x i8] %.offload_ptrs5 to i64*			; CHECK-NEXT: %10 = bitcast [1 x i8] %.offload_ptrs5 to i64*
	; CHECK-NEXT: store i64 %6, i64* %10, align 8			; CHECK-NEXT: store i64 %6, i64* %10, align 8
	; CHECK-NEXT: %11 = call i32 @__tgt_target_teams_mapper(i64 -1, i8* nonnull @.__omp_offloading_heavyComputation1.region_id, i32 1, i8 nonnull %7, i8 nonnull %9, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @.offload_sizes.1, i64 0, i64 0), i64* getelementptr inbounds ([1 x i64], [1 x i64]* @.offload_maptypes.2, i64 0, i64 0), i8** null, i32 0, i32 0)			; CHECK-NEXT: %11 = call i32 @__tgt_target_teams_mapper(i64 -1, i8* nonnull @.__omp_offloading_heavyComputation1.region_id, i32 1, i8 nonnull %7, i8 nonnull %9, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @.offload_sizes.1, i64 0, i64 0), i64* getelementptr inbounds ([1 x i64], [1 x i64]* @.offload_maptypes.2, i64 0, i64 0), i8** null, i32 0, i32 0)
	; CHECK-NEXT: %.not = icmp eq i32 %11, 0			; CHECK-NEXT: %.not = icmp eq i32 %11, 0
	; CHECK-NEXT: br i1 %.not, label %omp_offload.cont, label %omp_offload.failed			; CHECK-NEXT: br i1 %.not, label %omp_offload.cont, label %omp_offload.failed
	; CHECK: omp_offload.failed:			; CHECK: omp_offload.failed:
	▲ Show 20 Lines • Show All 104 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: store i32* %size.addr, i32** %7, align 8			; CHECK-NEXT: store i32* %size.addr, i32** %7, align 8
	; CHECK-NEXT: %8 = getelementptr inbounds [2 x i8], [2 x i8]* %.offload_ptrs, i64 0, i64 1			; CHECK-NEXT: %8 = getelementptr inbounds [2 x i8], [2 x i8]* %.offload_ptrs, i64 0, i64 1
	; CHECK-NEXT: %9 = bitcast i8 %8 to i32			; CHECK-NEXT: %9 = bitcast i8 %8 to i32
	; CHECK-NEXT: store i32* %size.addr, i32** %9, align 8			; CHECK-NEXT: store i32* %size.addr, i32** %9, align 8
	; CHECK-NEXT: %10 = getelementptr inbounds [2 x i64], [2 x i64]* %.offload_sizes, i64 0, i64 1			; CHECK-NEXT: %10 = getelementptr inbounds [2 x i64], [2 x i64]* %.offload_sizes, i64 0, i64 1
	; CHECK-NEXT: store i64 4, i64* %10, align 8			; CHECK-NEXT: store i64 4, i64* %10, align 8

	; CHECK-NEXT: %handle = call %struct.__tgt_async_info @__tgt_target_data_begin_mapper_issue(i64 -1, i32 2, i8 %1, i8 %3, i64* %5, i64* getelementptr inbounds ([2 x i64], [2 x i64]* @.offload_maptypes.3, i64 0, i64 0), i8** null)			; CHECK-NEXT: %handle = call %struct.__tgt_async_info @__tgt_target_data_begin_mapper_issue(i64 -1, i32 2, i8 %1, i8 %3, i64* %5, i64* getelementptr inbounds ([2 x i64], [2 x i64]* @.offload_maptypes.3, i64 0, i64 0), i8** null)
	; CHECK-NEXT: call void @__tgt_target_data_begin_mapper_wait(i64 -1, %struct.__tgt_async_info %handle)

	; CHECK-NEXT: %11 = load i32, i32* %size.addr, align 4			; CHECK-NEXT: %11 = load i32, i32* %size.addr, align 4
	; CHECK-NEXT: %size.casted = zext i32 %11 to i64			; CHECK-NEXT: %size.casted = zext i32 %11 to i64
	; CHECK-NEXT: %12 = getelementptr inbounds [2 x i8], [2 x i8]* %.offload_baseptrs2, i64 0, i64 0			; CHECK-NEXT: %12 = getelementptr inbounds [2 x i8], [2 x i8]* %.offload_baseptrs2, i64 0, i64 0
	; CHECK-NEXT: %13 = bitcast [2 x i8] %.offload_baseptrs2 to i64*			; CHECK-NEXT: %13 = bitcast [2 x i8] %.offload_baseptrs2 to i64*

				; CHECK-NEXT: call void @__tgt_target_data_begin_mapper_wait(i64 -1, %struct.__tgt_async_info %handle)

	; CHECK-NEXT: store i64 %size.casted, i64* %13, align 8			; CHECK-NEXT: store i64 %size.casted, i64* %13, align 8
	; CHECK-NEXT: %14 = getelementptr inbounds [2 x i8], [2 x i8]* %.offload_ptrs3, i64 0, i64 0			; CHECK-NEXT: %14 = getelementptr inbounds [2 x i8], [2 x i8]* %.offload_ptrs3, i64 0, i64 0
	; CHECK-NEXT: %15 = bitcast [2 x i8] %.offload_ptrs3 to i64*			; CHECK-NEXT: %15 = bitcast [2 x i8] %.offload_ptrs3 to i64*
	; CHECK-NEXT: store i64 %size.casted, i64* %15, align 8			; CHECK-NEXT: store i64 %size.casted, i64* %15, align 8
	; CHECK-NEXT: %16 = getelementptr inbounds [2 x i8], [2 x i8]* %.offload_baseptrs2, i64 0, i64 1			; CHECK-NEXT: %16 = getelementptr inbounds [2 x i8], [2 x i8]* %.offload_baseptrs2, i64 0, i64 1
	; CHECK-NEXT: %17 = bitcast i8 %16 to double			; CHECK-NEXT: %17 = bitcast i8 %16 to double
	; CHECK-NEXT: store double* %a, double** %17, align 8			; CHECK-NEXT: store double* %a, double** %17, align 8
	; CHECK-NEXT: %18 = getelementptr inbounds [2 x i8], [2 x i8]* %.offload_ptrs3, i64 0, i64 1			; CHECK-NEXT: %18 = getelementptr inbounds [2 x i8], [2 x i8]* %.offload_ptrs3, i64 0, i64 1
	▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: store i32* %size.addr, i32** %7, align 8			; CHECK-NEXT: store i32* %size.addr, i32** %7, align 8
	; CHECK-NEXT: %8 = getelementptr inbounds [2 x i8], [2 x i8]* %.offload_ptrs, i64 0, i64 1			; CHECK-NEXT: %8 = getelementptr inbounds [2 x i8], [2 x i8]* %.offload_ptrs, i64 0, i64 1
	; CHECK-NEXT: %9 = bitcast i8 %8 to i32			; CHECK-NEXT: %9 = bitcast i8 %8 to i32
	; CHECK-NEXT: store i32* %size.addr, i32** %9, align 8			; CHECK-NEXT: store i32* %size.addr, i32** %9, align 8
	; CHECK-NEXT: %10 = getelementptr inbounds [2 x i64], [2 x i64]* %.offload_sizes, i64 0, i64 1			; CHECK-NEXT: %10 = getelementptr inbounds [2 x i64], [2 x i64]* %.offload_sizes, i64 0, i64 1
	; CHECK-NEXT: store i64 4, i64* %10, align 8			; CHECK-NEXT: store i64 4, i64* %10, align 8

	; CHECK-NEXT: %handle = call %struct.__tgt_async_info @__tgt_target_data_begin_mapper_issue(i64 -1, i32 2, i8 %1, i8 %3, i64* %5, i64* getelementptr inbounds ([2 x i64], [2 x i64]* @.offload_maptypes.3, i64 0, i64 0), i8** null)			; CHECK-NEXT: %handle = call %struct.__tgt_async_info @__tgt_target_data_begin_mapper_issue(i64 -1, i32 2, i8 %1, i8 %3, i64* %5, i64* getelementptr inbounds ([2 x i64], [2 x i64]* @.offload_maptypes.3, i64 0, i64 0), i8** null)
	; CHECK-NEXT: call void @__tgt_target_data_begin_mapper_wait(i64 -1, %struct.__tgt_async_info %handle)

	; CHECK-NEXT: %11 = load i32, i32* %size.addr, align 4			; CHECK-NEXT: %11 = load i32, i32* %size.addr, align 4
	; CHECK-NEXT: %size.casted = zext i32 %11 to i64			; CHECK-NEXT: %size.casted = zext i32 %11 to i64
	; CHECK-NEXT: %12 = getelementptr inbounds [2 x i8], [2 x i8]* %.offload_baseptrs2, i64 0, i64 0			; CHECK-NEXT: %12 = getelementptr inbounds [2 x i8], [2 x i8]* %.offload_baseptrs2, i64 0, i64 0
	; CHECK-NEXT: %13 = bitcast [2 x i8] %.offload_baseptrs2 to i64*			; CHECK-NEXT: %13 = bitcast [2 x i8] %.offload_baseptrs2 to i64*

				; CHECK-NEXT: call void @__tgt_target_data_begin_mapper_wait(i64 -1, %struct.__tgt_async_info %handle)

	; CHECK-NEXT: store i64 %size.casted, i64* %13, align 8			; CHECK-NEXT: store i64 %size.casted, i64* %13, align 8
	; CHECK-NEXT: %14 = getelementptr inbounds [2 x i8], [2 x i8]* %.offload_ptrs3, i64 0, i64 0			; CHECK-NEXT: %14 = getelementptr inbounds [2 x i8], [2 x i8]* %.offload_ptrs3, i64 0, i64 0
	; CHECK-NEXT: %15 = bitcast [2 x i8] %.offload_ptrs3 to i64*			; CHECK-NEXT: %15 = bitcast [2 x i8] %.offload_ptrs3 to i64*
	; CHECK-NEXT: store i64 %size.casted, i64* %15, align 8			; CHECK-NEXT: store i64 %size.casted, i64* %15, align 8
	; CHECK-NEXT: %16 = getelementptr inbounds [2 x i8], [2 x i8]* %.offload_baseptrs2, i64 0, i64 1			; CHECK-NEXT: %16 = getelementptr inbounds [2 x i8], [2 x i8]* %.offload_baseptrs2, i64 0, i64 1
	; CHECK-NEXT: %17 = bitcast i8 %16 to double			; CHECK-NEXT: %17 = bitcast i8 %16 to double
	; CHECK-NEXT: store double* %a, double** %17, align 8			; CHECK-NEXT: store double* %a, double** %17, align 8
	; CHECK-NEXT: %18 = getelementptr inbounds [2 x i8], [2 x i8]* %.offload_ptrs3, i64 0, i64 1			; CHECK-NEXT: %18 = getelementptr inbounds [2 x i8], [2 x i8]* %.offload_ptrs3, i64 0, i64 1
	▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: store double* %a, double** %2, align 8			; CHECK-NEXT: store double* %a, double** %2, align 8
	; CHECK-NEXT: %3 = getelementptr inbounds [1 x i8], [1 x i8]* %.offload_ptrs, i64 0, i64 0			; CHECK-NEXT: %3 = getelementptr inbounds [1 x i8], [1 x i8]* %.offload_ptrs, i64 0, i64 0
	; CHECK-NEXT: %4 = bitcast [1 x i8] %.offload_ptrs to double**			; CHECK-NEXT: %4 = bitcast [1 x i8] %.offload_ptrs to double**
	; CHECK-NEXT: store double* %a, double** %4, align 8			; CHECK-NEXT: store double* %a, double** %4, align 8
	; CHECK-NEXT: %5 = getelementptr inbounds [1 x i64], [1 x i64]* %.offload_sizes, i64 0, i64 0			; CHECK-NEXT: %5 = getelementptr inbounds [1 x i64], [1 x i64]* %.offload_sizes, i64 0, i64 0
	; CHECK-NEXT: store i64 %0, i64* %5, align 8			; CHECK-NEXT: store i64 %0, i64* %5, align 8

	; CHECK-NEXT: %handle = call %struct.__tgt_async_info @__tgt_target_data_begin_mapper_issue(i64 -1, i32 1, i8 %1, i8 %3, i64* %5, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @.offload_maptypes.5, i64 0, i64 0), i8** null)			; CHECK-NEXT: %handle = call %struct.__tgt_async_info @__tgt_target_data_begin_mapper_issue(i64 -1, i32 1, i8 %1, i8 %3, i64* %5, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @.offload_maptypes.5, i64 0, i64 0), i8** null)
	; CHECK-NEXT: call void @__tgt_target_data_begin_mapper_wait(i64 -1, %struct.__tgt_async_info %handle)

	; CHECK-NEXT: %rem = urem i32 %call, %size			; CHECK-NEXT: %rem = urem i32 %call, %size

				; CHECK-NEXT: call void @__tgt_target_data_begin_mapper_wait(i64 -1, %struct.__tgt_async_info %handle)

	; CHECK-NEXT: call void @__tgt_target_data_end_mapper(i64 -1, i32 1, i8 nonnull %1, i8 nonnull %3, i64* nonnull %5, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @.offload_maptypes.5, i64 0, i64 0), i8** null)			; CHECK-NEXT: call void @__tgt_target_data_end_mapper(i64 -1, i32 1, i8 nonnull %1, i8 nonnull %3, i64* nonnull %5, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @.offload_maptypes.5, i64 0, i64 0), i8** null)
	; CHECK-NEXT: ret i32 %rem			; CHECK-NEXT: ret i32 %rem
	;			;
	entry:			entry:
	%.offload_baseptrs = alloca [1 x i8*], align 8			%.offload_baseptrs = alloca [1 x i8*], align 8
	%.offload_ptrs = alloca [1 x i8*], align 8			%.offload_ptrs = alloca [1 x i8*], align 8
	%.offload_sizes = alloca [1 x i64], align 8			%.offload_sizes = alloca [1 x i64], align 8

	Show All 29 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMPOpt][SplitMemTransfer] Moving the "wait" downClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 286340

llvm/lib/Transforms/IPO/OpenMPOpt.cpp

llvm/test/Transforms/OpenMP/hide_mem_transfer_latency.ll

[OpenMPOpt][SplitMemTransfer] Moving the "wait" down
ClosedPublic