This is for hiding memory transfer latency when offloading to a device. The idea is to split
the __tgt_target_data_begin_mapper function to issue and wait and call
the issue function as early as possible in the program.
Details
- Reviewers
jdoerfert
Diff Detail
Event Timeline
Mostly minor comments that should be addressed. Also, test are missing.
Maybe split it into the runtime part and the compiler part to make the patches smaller.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
4452 | Put all of this into an anonymous namespace or make them member functions of OpenMPOpt, or a mix of the two. Also add documentation describing what is happening briefly. | |
4459 | Use DL.getAllocaAddrSpace() to get the AddressSpace, with DL being the data layout (const DataLayout &DL = M->getDataLayout();) | |
4479 | Put the comment in the line before the value. | |
4488 | Don't return a container, pass it in by reference instead. | |
4507 | This operation is going to be costly, can we avoid it? | |
4512 | Pass containers by reference to avoid copies. | |
4522 | Why didn't we do this right away when we created the UseTree vector? | |
4526 | Avoid duplication. RuntimeCall is fine to be used instead below. | |
4545 | isn't front() just RuntimeCall? If so, use it. | |
4569 | Use a protected namespace name, e.g., "__openmp_mapper_issue_wrapper_" + MapperBB->getParent()->getName() | |
4585 | Move this code into the above function. | |
4599 | This is not a good idea. Instead, return the calls as part of the splitting. This won't work if you split multiple times in the same function and it also is really costly. | |
4603 | I don't recall why we have the callbase + read condition below. Maybe remove it for now? | |
4609–4612 | ||
4633 | You need to keep track of the blocks you visited or loops will cause this to continue forever. Once you have a visited set, you can also pop the last SuccVec member always as you don't need all of them. | |
4640 | Linear searches are generally not a good idea. I'll comment on the call site. | |
4672 | ||
4698–4701 | ||
4733 | The return value of this function is defined as: ┊ /// Run the callback \p CB on each use within the function \p F and forget ┊ /// the use if the result is true. The Changed variable is good to keep track overall but each use that is removed has to be reported via the return value of the callback. | |
4747 | Avoid these trailing comments. | |
4752 | Also here, return the call from outlineMapperIssueRT or go through the users of IssueWrapper, don't "search" for the call by name. | |
4754 | This can be done in the outlineMapperIssueRT function. | |
4759 | Style: IssueMoved... | |
4768 | Use FAM to get the DominatorTree. | |
4786 | The move code here should go into the moveIssue helper. No need to split it into two parts. | |
5301 | You should move this into the run method of OpenMPOpt as there is also the split part, no? | |
openmp/libomptarget/include/omptarget.h | ||
190 ↗ | (On Diff #453340) | |
193 ↗ | (On Diff #453340) | |
203 ↗ | (On Diff #453340) | unrelated. |
292 ↗ | (On Diff #453340) | newline. Also, add a comment above both to describe that what they do. |
openmp/libomptarget/src/interface.cpp | ||
110 ↗ | (On Diff #453340) | make it if (SUCCESS) { if (handle) ... else ... } to help people read it. |
121 ↗ | (On Diff #453340) | Modify the text to make sure we know we are in the wait now |
128 ↗ | (On Diff #453340) | We need to extract the logic to get the DeviceId from the above function. |
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
4667 | Similar to the other comment, keep a visited set. that also allows to avoid the linear search in the vector which causes quadratic behavior here since you do it for every entry. | |
4726 | This should be combined with the issue rtc in orig bb, maybe use a flag. | |
4786 | At least have a single helper to move the issue stuff, which then call call the block helper if we need to split it. |
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
1390 | Should add an amdgpu test and you probably need an aaddrspacecast. I just fixed the same problem in c3054aeb5a3ba7778b1296722cfb90b494819b60 |
Should add an amdgpu test and you probably need an aaddrspacecast. I just fixed the same problem in c3054aeb5a3ba7778b1296722cfb90b494819b60