This is an archive of the discontinued LLVM Phabricator instance.

[OpenMPOpt] Improving memory transfer latency hiding
Needs ReviewPublic

Authored by dtalaashrafi on Aug 17 2022, 11:07 AM.

Details

Reviewers
jdoerfert
Summary

This is for hiding memory transfer latency when offloading to a device. The idea is to split
the __tgt_target_data_begin_mapper function to issue and wait and call
the issue function as early as possible in the program.

Diff Detail

Event Timeline

dtalaashrafi created this revision.Aug 17 2022, 11:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2022, 11:07 AM
dtalaashrafi requested review of this revision.Aug 17 2022, 11:07 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 17 2022, 11:07 AM
dtalaashrafi edited the summary of this revision. (Show Details)Aug 18 2022, 3:59 PM
dtalaashrafi edited the summary of this revision. (Show Details)Aug 18 2022, 4:04 PM

Mostly minor comments that should be addressed. Also, test are missing.
Maybe split it into the runtime part and the compiler part to make the patches smaller.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
4452

Put all of this into an anonymous namespace or make them member functions of OpenMPOpt, or a mix of the two.

Also add documentation describing what is happening briefly.

4459

Use DL.getAllocaAddrSpace() to get the AddressSpace, with DL being the data layout (const DataLayout &DL = M->getDataLayout();)

4479

Put the comment in the line before the value.
Also add some assertions to verify their type matches what you expect.

4488

Don't return a container, pass it in by reference instead.

4507

This operation is going to be costly, can we avoid it?

4512

Pass containers by reference to avoid copies.

4522

Why didn't we do this right away when we created the UseTree vector?

4526

Avoid duplication. RuntimeCall is fine to be used instead below.

4545

isn't front() just RuntimeCall? If so, use it.

4569

Use a protected namespace name, e.g.,

"__openmp_mapper_issue_wrapper_" + MapperBB->getParent()->getName()

4585

Move this code into the above function.

4599

This is not a good idea. Instead, return the calls as part of the splitting. This won't work if you split multiple times in the same function and it also is really costly.

4603

I don't recall why we have the callbase + read condition below. Maybe remove it for now?

4609–4612
4633

You need to keep track of the blocks you visited or loops will cause this to continue forever.

Once you have a visited set, you can also pop the last SuccVec member always as you don't need all of them.

4640

Linear searches are generally not a good idea. I'll comment on the call site.

4672
4698–4701
4733

The return value of this function is defined as:

┊ /// Run the callback \p CB on each use within the function \p F and forget
┊ /// the use if the result is true.

The Changed variable is good to keep track overall but each use that is removed has to be reported via the return value of the callback.

4747

Avoid these trailing comments.

4752

Also here, return the call from outlineMapperIssueRT or go through the users of IssueWrapper, don't "search" for the call by name.

4754

This can be done in the outlineMapperIssueRT function.

4759

Style: IssueMoved...
Also other places, like split above.

4768

Use FAM to get the DominatorTree.

4786

The move code here should go into the moveIssue helper. No need to split it into two parts.

5301

You should move this into the run method of OpenMPOpt as there is also the split part, no?
This should be combined with (or replace) the existing code in hideMemTransfersLatency

openmp/libomptarget/include/omptarget.h
190 ↗(On Diff #453340)
193 ↗(On Diff #453340)
203 ↗(On Diff #453340)

unrelated.

292 ↗(On Diff #453340)

newline. Also, add a comment above both to describe that what they do.

openmp/libomptarget/src/interface.cpp
110 ↗(On Diff #453340)

make it

if (SUCCESS) {
  if (handle)
    ...
  else
    ...
}

to help people read it.

121 ↗(On Diff #453340)

Modify the text to make sure we know we are in the wait now

128 ↗(On Diff #453340)

We need to extract the logic to get the DeviceId from the above function.

jdoerfert added inline comments.Aug 24 2022, 10:05 AM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
4667

Similar to the other comment, keep a visited set. that also allows to avoid the linear search in the vector which causes quadratic behavior here since you do it for every entry.
We also need to cache the accessibility queries. Probably best to go from B upwards instead of Dom downwards. You will only visit the blocks that can reach B that way without needing any check.

4726

This should be combined with the issue rtc in orig bb, maybe use a flag.

4786

At least have a single helper to move the issue stuff, which then call call the block helper if we need to split it.

dtalaashrafi marked 35 inline comments as done.
dtalaashrafi marked an inline comment as done.
arsenm added a subscriber: arsenm.Jan 3 2023, 3:38 PM
arsenm added inline comments.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
1390

Should add an amdgpu test and you probably need an aaddrspacecast. I just fixed the same problem in c3054aeb5a3ba7778b1296722cfb90b494819b60