This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
openmp/libomptarget/src/
-
libomptarget/
-
src/
-
CMakeLists.txt
14/14
device.cpp
3/3
memory.h
33/38
memory.cpp
-
omptarget.cpp
-
rtl.cpp

Differential D81054

[OpenMP] Introduce target memory manager
ClosedPublic

Authored by tianshilei1992 on Jun 2 2020, 10:12 PM.

Download Raw Diff

Details

Reviewers

jdoerfert
ye-luo
JonChesterfield

Commits

rG0289696751e9: [OpenMP] Introduce target memory manager

Summary

Target memory manager is introduced in this patch which aims to manage target
memory such that they will not be freed immediately when they are not used
because the overhead of memory allocation and free is very large. For CUDA
device, cuMemFree even blocks the context switch on device which affects
concurrent kernel execution.

The memory manager can be taken as a memory pool. It divides the pool into
multiple buckets according to the size such that memory allocation/free
distributed to different buckets will not affect each other.

In this version, we use the exact-equality policy to find a free buffer. This
is an open question: will best-fit work better here? IMO, best-fit is not good
for target memory management because computation on GPU usually requires GBs of
data. Best-fit might lead to a serious waste. For example, there is a free
buffer of size 1960MB, and now we need a buffer of size 1200MB. If best-fit,
the free buffer will be returned, leading to a 760MB waste.

The allocation will happen when there is no free memory left, and the memory
free on device will take place in the following two cases:

The program ends. Obviously. However, there is a little problem that plugin

library is destroyed before the memory manager is destroyed, leading to a fact
that the call to target plugin will not succeed.

Device is out of memory when we request a new memory. The manager will walk

through all free buffers from the bucket with largest base size, pick up one
buffer, free it, and try to allocate immediately. If it succeeds, it will
return right away rather than freeing all buffers in free list.

Update:
A threshold (8KB by default) is set such that users could control what size of memory
will be managed by the manager. It can also be configured by an environment variable
LIBOMPTARGET_MEMORY_MANAGER_THRESHOLD.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Updated according to comments

tianshilei1992 marked 6 inline comments as done.Aug 5 2020, 12:32 PM

Updated the calculation of NumBuckets

tianshilei1992 marked an inline comment as done.Aug 5 2020, 12:39 PM

Harbormaster completed remote builds in B67160: Diff 283341.Aug 5 2020, 1:28 PM

Use const_iterator

Harbormaster completed remote builds in B67167: Diff 283349.Aug 5 2020, 1:56 PM

Make mutex close to their protected variables

Harbormaster completed remote builds in B67186: Diff 283375.Aug 5 2020, 3:04 PM

Harbormaster completed remote builds in B67197: Diff 283399.Aug 5 2020, 3:57 PM

Removed the plugin interface

tianshilei1992 edited the summary of this revision. (Show Details)Aug 10 2020, 6:37 PM

Some comments and nits you should take under consideration.

I'm not 100% sold on the list design, that we look for the exact size, and that we traverse the list while we look.
However, this is improving a lot over the status quo and we can revisit this with more profiling information later.

LGTM

Nits:
I'd rename memory.h into MemoryManager.h if we don't expect anything else to go in there that is not "a memory manager" at the end of the day. Same with the cpp.
I'm not sure we need the memory namespace, or the impl namespace for that matter.

openmp/libomptarget/src/memory.cpp
25	The last sentence is now obsolete. I'd just state that there is an environment variable to set the threshold for which we will manage allocations. Please actually put the name of the variable here ;).
69	Can we rename `flp2` into `findPreviousPowerOfTwo` or something similarly descriptive?
158
250	This pattern occurs at least twice, might be worth to put it in a helper method, e.g., `allocateOrFreeAndAllocate` for the lack of a better name ;)
254	This message should be more descriptive I guess. "Return nullptr" is not helpful. Maybe spell out that we failed to allocate the requested memory, the device might be OOM. I guess this is also a good spot for some debugger events eventually...

This revision is now accepted and ready to land.Aug 10 2020, 7:02 PM

jdoerfert added inline comments.Aug 10 2020, 7:02 PM

openmp/libomptarget/plugins/exports
22 ↗	(On Diff #284546)	leftover.

Harbormaster completed remote builds in B67826: Diff 284546.Aug 10 2020, 7:07 PM

I'm still doubtful about this. Bump allocate + no-op free is fast unless the GPU runs out of memory before the arena can be dropped. The list and mutex construction is unusual for an allocator.

Could it be moved under the cuda subdirectory, until another plugin wishes to use it? That means the logic for detecting if it's in use and corresponding API disappear for now.

Update based on comments

Fixed compilation error

In D81054#2209051, @JonChesterfield wrote:

I'm still doubtful about this. Bump allocate + no-op free is fast unless the GPU runs out of memory before the arena can be dropped. The list and mutex construction is unusual for an allocator.

Right. We are working on that and if it turns out to be always superior we can move to that model. So far, this model is superior to what we had, by a lot.

Could it be moved under the cuda subdirectory, until another plugin wishes to use it? That means the logic for detecting if it's in use and corresponding API disappear for now.

This is *not* CUDA specific at all, please do not move generic things into target sub-directories, that is counterproductive. If we have another plugin that want to opt-out/in, we can have hooks for that. As there is non we support right now, hooks are added on-demand later.

In D81054#2209051, @JonChesterfield wrote:

I'm still doubtful about this. Bump allocate + no-op free is fast unless the GPU runs out of memory before the arena can be dropped. The list and mutex construction is unusual for an allocator.

The memory manager is not an allocator. We do need the mutex for the thread safety. I can't figure out a better way not to use the "list", which is a std::multiset here for efficient look up based on the size. Bump allocator is in another patch.

In D81054#2208730, @jdoerfert wrote:

Some comments and nits you should take under consideration.

I'm not 100% sold on the list design, that we look for the exact size, and that we traverse the list while we look.

The "list" is not a real list. It is a std::multiset here. So basically its look up complexity is O(logn) on average. If we don't have such a thing, what would be a better way to organize those free nodes with different sizes?

I'd rename memory.h into MemoryManager.h if we don't expect anything else to go in there that is not "a memory manager" at the end of the day. Same with the cpp.

Done.

I'm not sure we need the memory namespace, or the impl namespace for that matter.

I prefer to leave the namespace. Current implementation of libomptarget has really poor code style. This is a totally new file. I hope to make it right from it.

Deleted unnecessary changes

Harbormaster completed remote builds in B67990: Diff 284868.Aug 11 2020, 3:03 PM

Harbormaster completed remote builds in B67992: Diff 284873.Aug 11 2020, 3:08 PM

Harbormaster completed remote builds in B67995: Diff 284877.Aug 11 2020, 3:27 PM

OK, cool. If we're open to changing the implementation later this is fine by me. An instance per host thread is likely to be better than all the internal locks. Couple of minor comments above.

There are use cases for allocating device memory within the plugin itself. I think including MemoryManager.h from within the plugin would work for that.

openmp/libomptarget/src/MemoryManager.h
27 ↗	(On Diff #284877)	Can we drop the shared_ptr here? Better to have the MemoryManager move-only and use unique_ptr
37 ↗	(On Diff #284877)	Deallocate taking a size usually allows a faster implementation, but that can be left until said faster implementation is proposed

In D81054#2213237, @JonChesterfield wrote:

OK, cool. If we're open to changing the implementation later this is fine by me.

Always!

An instance per host thread is likely to be better than all the internal locks.

That is one of the things we can profile and change, no objection if it turns out problematic.

LGTM then. Calling into the plugin to do the bulk alloc/free is nice.

Using std::unique_ptr for the Pimpl

tianshilei1992 marked 2 inline comments as done.Aug 12 2020, 9:19 AM

tianshilei1992 added inline comments.

openmp/libomptarget/src/MemoryManager.h
37 ↗	(On Diff #284877)	I agree. Currently the plugin interface does not have such argument so we don't need that. In the future we might add that.

tianshilei1992 marked an inline comment as done.Aug 12 2020, 9:19 AM

In D81054#2213237, @JonChesterfield wrote:

There are use cases for allocating device memory within the plugin itself. I think including MemoryManager.h from within the plugin would work for that.

Unluckily, it doesn't work because it has a DeviceTy object…We might have common things such that all plugins can share in the future.

Using std::multiset::find instead of std::find_if for better performance

Updated some comments

Harbormaster completed remote builds in B68124: Diff 285107.Aug 12 2020, 10:01 AM

Please mention LIBOMPTARGET_MEMORY_MANAGER_THRESHOLD, default value and unit in the patch summary.
Is it possible to have a unit test testing the manager class behaviors?
Can we offload to host and run address sanitizer or valgrind?

I'm not sure if I'm asking for too much here.

openmp/libomptarget/src/MemoryManager.cpp
324 ↗	(On Diff #285107)	SizeThreshold is global while Threshold is local. The default values is also different. I'm lost in the logic here.
openmp/libomptarget/src/MemoryManager.h
26 ↗	(On Diff #285107)	Why is the pointer needed? What is the design logic behind MemoryManagerTy and MemoryManagerImplTy layers? Can we just have one?
openmp/libomptarget/src/device.h
150 ↗	(On Diff #285107)	Could you explain why shared_ptr is needed?

tianshilei1992 marked 3 inline comments as done.Aug 12 2020, 10:11 AM

tianshilei1992 added inline comments.

openmp/libomptarget/src/MemoryManager.cpp
324 ↗	(On Diff #285107)	Yeah, you're lost. By default, `Threshold` is 0, which means we will not overwrite `SizeThreshold`.
openmp/libomptarget/src/MemoryManager.h
26 ↗	(On Diff #285107)	Pimpl. Like my previous comments mentioned before, this header will be included by others, I don't want unnecessary headers/declarations/definitions to be included to pollute others.
openmp/libomptarget/src/device.h
150 ↗	(On Diff #285107)	Such that I don't need to include `MemoryManager.h` in the header, and it doesn't hurt anything.

tianshilei1992 marked 3 inline comments as done.Aug 12 2020, 10:11 AM

Block the patch temporarily for my earlier questions.

This revision now requires changes to proceed.Aug 12 2020, 10:11 AM

In D81054#2213550, @ye-luo wrote:

Please mention LIBOMPTARGET_MEMORY_MANAGER_THRESHOLD, default value and unit in the patch summary.

Sure. Will do.

Is it possible to have a unit test testing the manager class behaviors?

I don't think so. We don't have the "unit test" framework you want. If you insist some tests, I could add a simple "feature" test here.

Can we offload to host and run address sanitizer or valgrind?

What do you mean by offload to host? This memory manager will not be used by the device.

tianshilei1992 edited the summary of this revision. (Show Details)Aug 12 2020, 10:16 AM

ye-luo added inline comments.Aug 12 2020, 10:22 AM

openmp/libomptarget/src/MemoryManager.h
26 ↗	(On Diff #285107)	That is the job of header and cpp files.
openmp/libomptarget/src/device.h
150 ↗	(On Diff #285107)	This is obviously a wrong way. Move the constructor and destructor to cpp.

It definitely can and should be tested. Instantiate on a device that uses host malloc/free for the functions and stress test it under valgrind.

I've started writing tests out of tree for stuff like this, which is not ideal, but means the code shipped without the tests is likely to be correct

tianshilei1992 added inline comments.Aug 12 2020, 10:26 AM

openmp/libomptarget/src/MemoryManager.h
26 ↗	(On Diff #285107)	No. You can refer to https://en.cppreference.com/w/cpp/language/pimpl for more details.
openmp/libomptarget/src/device.h
150 ↗	(On Diff #285107)	Why is it a wrong way? Is there any drawback?

In D81054#2213597, @JonChesterfield wrote:

It definitely can and should be tested. Instantiate on a device that uses host malloc/free for the functions and stress test it under valgrind.

The "unit test" Ye mentions is not the one you said here. I agree to add a test like you said and I will. The "unit test" Ye wants is to test the class MemoryManagerTy directly, which is currently not feasible. We don't have a test framework to support that.

Harbormaster completed remote builds in B68134: Diff 285123.Aug 12 2020, 10:44 AM

Harbormaster completed remote builds in B68136: Diff 285127.Aug 12 2020, 10:59 AM

Improved performance by removing one map table operation

Added a new test

Harbormaster completed remote builds in B68145: Diff 285142.Aug 12 2020, 12:30 PM

Harbormaster completed remote builds in B68152: Diff 285155.Aug 12 2020, 1:12 PM

Replaced std::shared_ptr with std::unique_ptr in the class DeviceTy

Harbormaster completed remote builds in B68171: Diff 285188.Aug 12 2020, 3:36 PM

ye-luo added inline comments.Aug 12 2020, 4:49 PM

openmp/libomptarget/src/MemoryManager.h
26 ↗	(On Diff #285107)	Pimpl. Like my previous comments mentioned before, this header will be included by others, I don't want unnecessary headers/declarations/definitions to be included to pollute others. Where else do you have in mind this header will be included? So far there is only device.cpp.
openmp/libomptarget/src/device.cpp
31	Why do you think it is OK here leaving the copy constructor always setting MemoryManager nullptr? This cause surprises. The same question applies to assign operator as well.

ye-luo added inline comments.Aug 12 2020, 4:54 PM

openmp/libomptarget/src/MemoryManager.cpp
149 ↗	(On Diff #285188)	There can be race when you test List.empty().
273 ↗	(On Diff #285188)	There can be race in PtrToNodeTable when you find()

ye-luo added inline comments.Aug 12 2020, 5:09 PM

openmp/libomptarget/src/MemoryManager.cpp
327 ↗	(On Diff #285188)	make_unique is better.
324 ↗	(On Diff #285107)	Q1. Why SizeThreshold is not per device? Q2. I was asking for a way to opt-out this optimization. But you ignore LIBOMPTARGET_MEMORY_MANAGER_THRESHOLD=0
openmp/libomptarget/src/device.cpp
314	I think this is your real default. The default value of SizeThreshold always gets overwritten.

ye-luo added inline comments.Aug 12 2020, 5:31 PM

openmp/libomptarget/src/MemoryManager.cpp
107 ↗	(On Diff #285188)	Another shared_ptr. See `typedef std::set<HostDataToTargetTy, std::less<>> HostDataToTargetListTy;` as an example. There doesn't seem to need a pointer wrapping NodeTy.

ye-luo added inline comments.Aug 12 2020, 6:23 PM

openmp/libomptarget/src/MemoryManager.cpp
324 ↗	(On Diff #285107)	Remove Q2. Opt-out has been supported.
openmp/libomptarget/src/MemoryManager.h
30 ↗	(On Diff #285188)	Second (Third?) place with a default. Remove or error out if size 0?

tianshilei1992 marked 12 inline comments as done.Aug 12 2020, 7:35 PM

tianshilei1992 added inline comments.

openmp/libomptarget/src/MemoryManager.cpp
107 ↗	(On Diff #285188)	We might have same nodes both in the table and the free list. It is on purpose because the map relation never changes, which could save us an operation on the map.
149 ↗	(On Diff #285188)	It is on purpose, and the "race" is not a problem. Think about it. Even we wrap it into a lock and now it is empty, there is still chance that when we move to the next list, one or more nodes are returned to this list. No difference.
273 ↗	(On Diff #285188)	There is no race because same `NodePtr` will never go to two threads.
327 ↗	(On Diff #285188)	Sure.
324 ↗	(On Diff #285107)	Then how could you specify the threshold via environment variable for each device? You don't even know how many devices you're gonna have during the compilation time.
openmp/libomptarget/src/MemoryManager.h
30 ↗	(On Diff #285188)	Could remove it.
26 ↗	(On Diff #285107)	That attributes to the Pimpl idiom. It is not a good practice to have too much implementation stuffs in the header file.
openmp/libomptarget/src/device.cpp
31	`MemoryManager` will be initialized separately later. The only reason we need this is `std::vector<DeviceTy>` requires it. We don't copy or construct those objects afterwards.
314	Yes. The logic is a little weird. I'll refactor this part.

tianshilei1992 marked 9 inline comments as done.Aug 12 2020, 7:35 PM

JonChesterfield added inline comments.Aug 12 2020, 10:14 PM

openmp/libomptarget/src/MemoryManager.cpp
149 ↗	(On Diff #285188)	That seems to assume list.empty() is an atomic operation. It isn't - calling list.empty() from one thread while another can be inserting into the list is a data race. We could do something involving a relaxed read followed by a lock followed by another read, in the double checked locking fashion. Uncontended locks are cheap though so it's probably not worthwhile.
193 ↗	(On Diff #285188)	This seems bad. Perhaps we should call a function to do this work shortly before destroying the target plugin?
273 ↗	(On Diff #285188)	It looks like PtrToNodeTable can be modified by other threads while this is running. Doesn't matter that NodePtr itself is unique - can't call .find() on the structure while another thread is mutating it.
openmp/libomptarget/src/device.cpp
31	std::vector<DeviceTy> should be content with a move constructor. Then the copy constructor can be = delete.

tianshilei1992 marked 4 inline comments as done.Aug 13 2020, 1:32 PM

tianshilei1992 added inline comments.

openmp/libomptarget/src/MemoryManager.cpp
149 ↗	(On Diff #285188)	Double check does not work either. If the `empty` function might crash because of the data race, that is a problem. Otherwise, it is not a problem. Like I said, another thread could still insert node into the list after we check empty using a lock.
193 ↗	(On Diff #285188)	That is a potential problem, and actually it might not be a problem. Only when we're going to exit the process can this function be invoked. Even the deallocation will not succeed, GPU memory will be free once the process exits anyway.
273 ↗	(On Diff #285188)	The iterators will not be invalidated on `multiset` in insert operation, but anyway, I'm not sure whether it will crash in some middle status, so I'll wrap them into the guard lock.
openmp/libomptarget/src/device.cpp
31	`std::mutex` cannot be moved. That is the only reason we have the copy constructor.

tianshilei1992 marked 4 inline comments as done.Aug 13 2020, 1:32 PM

Updated based on comments

Harbormaster completed remote builds in B68363: Diff 285542.Aug 13 2020, 7:17 PM

JonChesterfield added inline comments.Aug 13 2020, 8:45 PM

openmp/libomptarget/src/MemoryManager.cpp
149 ↗	(On Diff #285188)	Double check requires an atomic read, though relaxed is fine. Or probably some use of barriers. Calling empty() while another thread modifies the list is the race. Because empty() is not atomic qualified, the race is UB. Empty probably resolves to loading two pointers and comparing them for equality, so I sympathise with the argument that the race is benign, but it's still prudent to remove the data races we know about. Less UB, and means data race detectors will have a better chance of helping find bugs.
273 ↗	(On Diff #285188)	A crash would be fine as we'd notice that. It's data corruption due to the race which is the hazard. Thanks for adding it to a locked region.
openmp/libomptarget/src/device.cpp
31	`std::mutex` can't be copied either. If a new default-initialised mutex is OK as the result of the copy, it would be OK as the result of a move too.

Removed the pimpl and namespace.

tianshilei1992 marked 3 inline comments as done.Aug 18 2020, 4:45 PM

tianshilei1992 added inline comments.

openmp/libomptarget/src/device.cpp
31	That is beyond the scope of this patch. Since it already has a user-defined copy operator, then let it be.

tianshilei1992 marked an inline comment as done.Aug 18 2020, 4:45 PM

Fixed comment of constructor

Harbormaster completed remote builds in B68820: Diff 286432.Aug 18 2020, 5:27 PM

Harbormaster completed remote builds in B68819: Diff 286430.

In addition,

the DeviceTy copy constructor and assign operator are imperfect before this patch. I don't think we can fix them in this patch. We should just document the imperfection here.
Because the memory limit is per allocation, it seems that the MemoryManager can still hold infinite amount of memory and we don't have way to free them. I'm concerned about having this feature on by default.

openmp/libomptarget/src/MemoryManager.cpp
130 ↗	(On Diff #286432)	N->Ptr is deleted here. Then the shared_ptr in FreeLists[I] is deleted here but PtrToNodeTable still has the shared_ptr and an address which is no more valid. If I understand correctly, you want FreeLists holds a subset of PtrToNodeTable memory segments. I think what you need is using FreeListTy = std::multiset<std::reference_wrapper<NodeTy>, NodeCmpTy>; std::unordered_map<void *, NodeTy> PtrToNodeTable; In this way, PtrToNodeTable is the unique owner of all the memory segments. FreeList only owns a reference.
324 ↗	(On Diff #285107)	Then how could you specify the threshold via environment variable for each device? You don't even know how many devices you're gonna have during the compilation time. Although your current implementation via environment variable cannot specify the size for each device, we may use configuration file in the future to control this. It will be helpful If you can facilitate this when cleaning up the logic for the default value.

tianshilei1992 marked an inline comment as done.Aug 18 2020, 7:42 PM

tianshilei1992 added inline comments.

openmp/libomptarget/src/MemoryManager.cpp
130 ↗	(On Diff #286432)	This is a nice catch. Thanks for that. Holding a reference will not solve the problem that the node should also be removed from the map. It is same as holding a `shared_ptr`, but I could in fact use the reference way for the `FreeListTy`. The initial implementation is to remove the node from the map table and then add it to the free lists. Later I want to avoid the unnecessary operation on the map table but forget to update here. I’ll fix it.
324 ↗	(On Diff #285107)	I would prefer to keep current one. If in the future we have request for the per-device threshold, we could change it by the time. This can keep the implementation consistent.

In D81054#2225277, @ye-luo wrote:

Because the memory limit is per allocation, it seems that the MemoryManager can still hold infinite amount of memory and we don't have way to free them. I'm concerned about having this feature on by default.

First, users can always opt out the feature. What’s more important, if we receive complaints that this feature causes their applications OOM, we could evaluate it and then make corresponding change. What we know for now is many applications benefit from it.

ye-luo added inline comments.Aug 18 2020, 8:01 PM

openmp/libomptarget/src/MemoryManager.cpp
130 ↗	(On Diff #286432)	The correct code needs to take care of both PtrToNodeTable and FreeLists regardless. Currently in the destructor, you first deal with PtrToNodeTable and then FreeLists with some nullptr check. If you switch to reference in FreeLists, only PtrToNodeTable needs to be taken care. I still hope you find shared_ptr not needed at all.

tianshilei1992 added inline comments.Aug 18 2020, 8:17 PM

openmp/libomptarget/src/MemoryManager.cpp
130 ↗	(On Diff #286432)	One benefit to use pointer is that we could use `nullptr` to tell a state, which is very important to narrow the critical area as much as possible. Reference does not have that quality so that I need to do more things in the critical area which is counter-efficient. I can take the map table as a container of nodes and use the raw pointer in the free lists.

ye-luo added inline comments.Aug 18 2020, 8:27 PM

openmp/libomptarget/src/MemoryManager.cpp
130 ↗	(On Diff #286432)	Please don’t use raw pointers. If you look at reference_wrapper it has the same cost as taking the address and store the address. C++ guru invented that for us in a safe way.

ye-luo added inline comments.Aug 18 2020, 8:47 PM

openmp/libomptarget/src/MemoryManager.cpp
219 ↗	(On Diff #286432)	When arrive here, the code should know if the memory is from free list or newly allocated. It doesn’t even need to do the find. It is wasting time. We may just use std::list if we don’t need to find.

Removed all shared_ptr stuffs and fixed one potential issue

tianshilei1992 marked 4 inline comments as done.Aug 19 2020, 11:37 AM

Harbormaster completed remote builds in B68927: Diff 286621.Aug 19 2020, 12:04 PM

ye-luo added inline comments.Aug 19 2020, 12:34 PM

openmp/libomptarget/src/MemoryManager.cpp
214 ↗	(On Diff #286621)	Use emplace and its return value iterator to avoid the later lookup(at).
234 ↗	(On Diff #286621)	I don't what the policy of using auto. auto makes the code cleaner. There are a few similar places with iterators.
openmp/libomptarget/src/device.cpp
336–337	Prefer else return RTL->data_delete(RTLDeviceID, TgtPtrBegin); the same change to RTL->data_alloc above

Updated based on review comments

tianshilei1992 marked 3 inline comments as done.Aug 19 2020, 1:05 PM

tianshilei1992 added inline comments.

openmp/libomptarget/src/device.cpp
336–337	It's a code style preference. I would go with "no else after return".

tianshilei1992 marked an inline comment as done.Aug 19 2020, 1:05 PM

LGTM

This revision is now accepted and ready to land.Aug 19 2020, 1:17 PM

Harbormaster completed remote builds in B68937: Diff 286642.Aug 19 2020, 1:41 PM

Fixed the build issue when OMPTARGET_DEBUG is not defined

Harbormaster completed remote builds in B68960: Diff 286686.Aug 19 2020, 5:18 PM

Fixed the clang-tidy warning llvm-header-guard

Harbormaster completed remote builds in B68973: Diff 286700.Aug 19 2020, 7:24 PM

Change the header guard to make clang-tidy happy

Harbormaster completed remote builds in B68975: Diff 286702.Aug 19 2020, 7:58 PM

Closed by commit rG0289696751e9: [OpenMP] Introduce target memory manager (authored by tianshilei1992). · Explain WhyAug 19 2020, 8:12 PM

This revision was automatically updated to reflect the committed changes.

tianshilei1992 added a commit: rG0289696751e9: [OpenMP] Introduce target memory manager.

Since I spent hours to hunt down several race conditions in libomp in the last months, please fix races immediately, when they are pointed out. There is no such thing as a benign race!

openmp/libomptarget/src/MemoryManager.cpp
149 ↗	(On Diff #285188)	Double check does not work either. If the `empty` function might crash because of the data race, that is a problem. Otherwise, it is not a problem. Like I said, another thread could still insert node into the list after we check empty using a lock. double check is used to solve race condition not data race. data race is UB and must be avoided. race condition is not UB and might be accepted (benign), but can also break the code - especially reference counting. To avoid the data race, as Jon said, you should use atomics. You might want to add an atomic counter to avoid the use of non-atomic List.empty(). When using double-checking, you need to perform all changes under lock (inserting to the list must be done under the same lock). All related double-checks occur under the same lock. In this case, the issue you tried to make can not occur.

As a heads up, I'm told this breaks amdgpu tests. @ronlieb is looking at the merge from upstream, don't have any more details at this time. The basic idea of wrapping device alloc seems likely to be sound for all targets so I'd guess we've run into a bug in this patch.

In D81054#2229637, @JonChesterfield wrote:

As a heads up, I'm told this breaks amdgpu tests. @ronlieb is looking at the merge from upstream, don't have any more details at this time. The basic idea of wrapping device alloc seems likely to be sound for all targets so I'd guess we've run into a bug in this patch.

If it is a thread-safety issue, adding mutex in out facing allocate and free should make the code safe while investigating the root cause.

In D81054#2229637, @JonChesterfield wrote:

As a heads up, I'm told this breaks amdgpu tests. @ronlieb is looking at the merge from upstream, don't have any more details at this time. The basic idea of wrapping device alloc seems likely to be sound for all targets so I'd guess we've run into a bug in this patch.

Yeah, issuing a bug would be nice because at least I could get a reproducer. ;-) BTW, all data race mentioned by others were guarded by lock actually.

JonChesterfield added inline comments.Aug 22 2020, 2:54 AM

openmp/libomptarget/src/MemoryManager.cpp
88 ↗	(On Diff #286705)	This "little issue" of calling into the target plugin after it has been destroyed is a contender for this patch not working on amdgpu. I still think the target plugin, if it wishes to use this allocator, should hold the state itself. That means the allocator can be used internally, e.g. for call frames or the parallel region malloc, as well making destruction order straightforward and correct.

grokos mentioned this in D85274: [OpenMP] Introduced a bump-like allocator into the target memory management.Oct 5 2020, 4:45 AM

The test asserts for x86 offloading:

memory_manager.cpp.tmp-x86_64-pc-linux-gnu: llvm-project/openmp/libomptarget/test/offloading/memory_manager.cpp:37: int main(int, char **): Assertion `buffer[j] == i' failed.
memory_manager.cpp.tmp-x86_64-pc-linux-gnu: llvm-project/openmp/libomptarget/test/offloading/memory_manager.cpp:37: int main(int, char **): Assertion `buffer[j] == i' failed.

In D81054#2369714, @protze.joachim wrote:

The test asserts for x86 offloading:

memory_manager.cpp.tmp-x86_64-pc-linux-gnu: llvm-project/openmp/libomptarget/test/offloading/memory_manager.cpp:37: int main(int, char **): Assertion `buffer[j] == i' failed.
memory_manager.cpp.tmp-x86_64-pc-linux-gnu: llvm-project/openmp/libomptarget/test/offloading/memory_manager.cpp:37: int main(int, char **): Assertion `buffer[j] == i' failed.

Cannot reproduce the failure on my side

I tested this with older clang releases (at least back to clang 9.0) and could reproduce the assertion. The error doesn't seem to be related to this patch, but the test just reveals the issue.

I could reduce the issue to:

#include <omp.h>
#include <cassert>
#include <iostream>
#define N 10

int main(int argc, char *argv[]) {
#pragma omp parallel for num_threads(4)
  for (int i = 0; i < 16; ++i) {
    int buffer[N];
    printf("i=%i, n=%i, buffer=%p\n",i,N,buffer);
#pragma omp critical
#pragma omp target teams distribute parallel for              \
    map(from                                                  \
        : buffer)
    for (int j = 0; j < N; ++j) {
      buffer[j] = i;
    }
    for (int j = 0; j < N; ++j) {
      if(buffer[j] != i){
        printf("buffer[j=%i]=%i != i=%i, buffer=%p\n",j,buffer[j],i,buffer);
        assert(buffer[j] == i);
      }
    }
  }
  std::cout << "PASS\n";
  return 0;
}

So I think, that the map(from) fails when executed from multiple threads. The issue goes away, if the initial test is executed with OMP_NUM_THREADS=1. Adding the critical does not solve the issue. So, I don't think that a race in libomptarget is causing the issue.

tcramer added a subscriber: tcramer.Dec 9 2020, 9:49 AM

protze.joachim added inline comments.Dec 10 2020, 8:44 AM

openmp/libomptarget/src/MemoryManager.cpp
88 ↗	(On Diff #286705)	@tianshilei1992 Any plan to fix this? This does not only break for AMD, but also for a plugin our group is working on. Without understanding all the details, I think, the destructor of DeviceTy should delete the MemoryManager? Would this solve the issue? I.e. is the DeviceTy destroyed before the target plugin is unloaded?

protze.joachim added inline comments.Dec 10 2020, 8:53 AM

openmp/libomptarget/src/MemoryManager.cpp
88 ↗	(On Diff #286705)	Nevermind, the unique_ptr should take care of the release. So, why is the device not destroyed before the plugin is unloaded?

I think I volunteered to fix the global constructor/destructor hazard, then forgot about it.

My intent is to add functions to the plugin:

some_enum __tgt_rtl_plugin_init(void);
some_enum __tgt_rtl_plugin_dtor(void);

with the invariant that plugin_init is the first function called on a given plugin, and plugin_dtor is the last function called. Probably also that init, dtor are called at most once, and the dtor is called exactly once if init is called.

The initialization that currently occurs for global variables in the plugin can then optionally be done in the init call. Libomptarget shall destroy the memory manager before calling dtor, so that it can make calls into the plugin during the destruction.

This doesn't address multiple instances of a given plugin, but also doesn't preclude it. Any plugin that doesn't implement these, won't have them called.

edit: However, I don't think libomptarget knows when a given plugin is no longer in use. There's a TODO in rtl.cpp about removing a RTL if it's not used any more, but I can't see how that can be derived reliably from calls into interface.cpp.

edit2: If we move LoadRTLs out of the first call to RegisterLib and into init() or the PluginManager constructor, then we can move some unloading logic out of UnregisterLib and call that from deinit(), at which point we'll have a good place to put the teardown,

I'm surprised to find no dlclose matching the dlopen. Instead of calling some function for init/destroy, can't we just use library constructor/destructors in the plugin? All MemoryManagers for a plugin should then be destroyed before the plugin is explicitly dlclosed.
I'm also surprised that LoadRTLs does not dlclose the library in case of missing symbols.

manorom added a subscriber: manorom.Dec 30 2020, 6:09 PM

manorom added inline comments.

openmp/libomptarget/src/MemoryManager.cpp
88 ↗	(On Diff #286705)	Nevermind, the unique_ptr should take care of the release. So, why is the device not destroyed before the plugin is unloaded? Hope I'm not too late to the party, but: If I tracked this down correctly, plugins don't really get unloaded explicitly but only when the host program terminates and the program and its libraries get unloaded by OS. Plugins keep ther state in global objects so their destructor is called when the plugin library is unloaded (at least thats when the VE plugin cleans up its resources, including its target memory). The MemoryManager is (ultimately) owned by the PluginManger which gets constructed explicitly by `__attribute__((constructor))` and `__attribute__((destructor))` functions in `rtl.cpp` So what I guess happens is, that the host program terminates, and then all global destructors are exeuted including those in libomptarget and the plugin libraries (before any library actually unloads). And the destructor which is called first happens to be the destrutor for the plugin library and the destructor function which deletes the PluginManger gets called later.

This should be disabled on non-cuda platforms. It is presently a performance improvement on cuda, might improve or regress performance on others, and has a call method on dead object bug that has been open for months.

In particular I don't think it helps performance on amdgpu and it's annoying to set an environment variable to suppress a known bug.

I’m going to put the issue on the top of my list.

In D81054#2484343, @tianshilei1992 wrote:

I’m going to put the issue on the top of my list.

Nice! Thank you.

I was thinking of adding an optional function to the plugin api, bool (*enable_memory_manager)(void) or similar, which defaults to return false; if not implemented. It seems the amd internal branch currently has an #if 0 around the entry point to avoid checking an environment variable, but I'd really like to get rid of that local patch.

The fix is on Phab now. Please refer to D94256 for more details.

Revision Contents

Path

Size

openmp/

libomptarget/

src/

5 lines

6 lines

35 lines

256 lines

7 lines

6 lines

Diff 268054

openmp/libomptarget/src/CMakeLists.txt

	##===----------------------------------------------------------------------===##			##===----------------------------------------------------------------------===##
	#			#
	# Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			# Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	# See https://llvm.org/LICENSE.txt for license information.			# See https://llvm.org/LICENSE.txt for license information.
	# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	#			#
	##===----------------------------------------------------------------------===##			##===----------------------------------------------------------------------===##
	#			#
	# Build offloading library libomptarget.so.			# Build offloading library libomptarget.so.
	#			#
	##===----------------------------------------------------------------------===##			##===----------------------------------------------------------------------===##

	libomptarget_say("Building offloading runtime library libomptarget.")			libomptarget_say("Building offloading runtime library libomptarget.")

	set(src_files			set(src_files
	api.cpp			api.cpp
	device.cpp			device.cpp
	interface.cpp			interface.cpp
				memory.cpp
	rtl.cpp			rtl.cpp
	omptarget.cpp			omptarget.cpp
	)			)

	# Build libomptarget library with libdl dependency.			# Build libomptarget library with libdl dependency.
	add_library(omptarget SHARED ${src_files})			add_library(omptarget SHARED ${src_files})
	target_link_libraries(omptarget			target_link_libraries(omptarget
	${CMAKE_DL_LIBS}			${CMAKE_DL_LIBS}
	"-Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/exports")			"-Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/exports")

	# Install libomptarget under the lib destination folder.			# Install libomptarget under the lib destination folder.
	install(TARGETS omptarget LIBRARY COMPONENT omptarget			install(TARGETS omptarget LIBRARY COMPONENT omptarget
	DESTINATION "${OPENMP_INSTALL_LIBDIR}")			DESTINATION "${OPENMP_INSTALL_LIBDIR}")

openmp/libomptarget/src/device.cpp

//===--------- device.cpp - Target independent OpenMP target RTL ----------===//		//===--------- device.cpp - Target independent OpenMP target RTL ----------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// Functionality for managing devices that are handled by RTL plugins.		// Functionality for managing devices that are handled by RTL plugins.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "device.h"		#include "device.h"
		#include "memory.h"
#include "private.h"		#include "private.h"
#include "rtl.h"		#include "rtl.h"

#include <cassert>		#include <cassert>
#include <climits>		#include <climits>
#include <string>		#include <string>

/// Map between Device ID (i.e. openmp device id) and its DeviceTy.		/// Map between Device ID (i.e. openmp device id) and its DeviceTy.
DevicesTy Devices;		DevicesTy Devices;

int DeviceTy::associatePtr(void HstPtrBegin, void TgtPtrBegin, int64_t Size) {		int DeviceTy::associatePtr(void HstPtrBegin, void TgtPtrBegin, int64_t Size) {
DataMapMtx.lock();		DataMapMtx.lock();

// Check if entry exists		// Check if entry exists
for (auto &HT : HostDataToTargetMap) {		for (auto &HT : HostDataToTargetMap) {
if ((uintptr_t)HstPtrBegin == HT.HstPtrBegin) {		if ((uintptr_t)HstPtrBegin == HT.HstPtrBegin) {
// Mapping already exists		// Mapping already exists
		ye-luoUnsubmitted Done Reply Inline Actions Why do you think it is OK here leaving the copy constructor always setting MemoryManager nullptr? This cause surprises. The same question applies to assign operator as well. ye-luo: Why do you think it is OK here leaving the copy constructor always setting MemoryManager…
		tianshilei1992AuthorUnsubmitted Done Reply Inline Actions `MemoryManager` will be initialized separately later. The only reason we need this is `std::vector<DeviceTy>` requires it. We don't copy or construct those objects afterwards. tianshilei1992: `MemoryManager` will be initialized separately later. The only reason we need this is `std…
		JonChesterfieldUnsubmitted Done Reply Inline Actions std::vector<DeviceTy> should be content with a move constructor. Then the copy constructor can be = delete. JonChesterfield: std::vector<DeviceTy> should be content with a move constructor. Then the copy constructor can…
		tianshilei1992AuthorUnsubmitted Done Reply Inline Actions `std::mutex` cannot be moved. That is the only reason we have the copy constructor. tianshilei1992: `std::mutex` cannot be moved. That is the only reason we have the copy constructor.
		JonChesterfieldUnsubmitted Done Reply Inline Actions `std::mutex` can't be copied either. If a new default-initialised mutex is OK as the result of the copy, it would be OK as the result of a move too. JonChesterfield: `std::mutex` can't be copied either. If a new default-initialised mutex is OK as the result of…
		tianshilei1992AuthorUnsubmitted Done Reply Inline Actions That is beyond the scope of this patch. Since it already has a user-defined copy operator, then let it be. tianshilei1992: That is beyond the scope of this patch. Since it already has a user-defined copy operator, then…
bool isValid = HT.HstPtrBegin == (uintptr_t) HstPtrBegin &&		bool isValid = HT.HstPtrBegin == (uintptr_t) HstPtrBegin &&
HT.HstPtrEnd == (uintptr_t) HstPtrBegin + Size &&		HT.HstPtrEnd == (uintptr_t) HstPtrBegin + Size &&
HT.TgtPtrBegin == (uintptr_t) TgtPtrBegin;		HT.TgtPtrBegin == (uintptr_t) TgtPtrBegin;
DataMapMtx.unlock();		DataMapMtx.unlock();
if (isValid) {		if (isValid) {
DP("Attempt to re-associate the same device ptr+offset with the same "		DP("Attempt to re-associate the same device ptr+offset with the same "
"host ptr, nothing to do\n");		"host ptr, nothing to do\n");
return OFFLOAD_SUCCESS;		return OFFLOAD_SUCCESS;
▲ Show 20 Lines • Show All 155 Lines • ▼ Show 20 Lines	if (RTLs->RequiresFlags & OMP_REQ_UNIFIED_SHARED_MEMORY &&
!HasCloseModifier) {		!HasCloseModifier) {
DP("Return HstPtrBegin " DPxMOD " Size=%ld RefCount=%s\n",		DP("Return HstPtrBegin " DPxMOD " Size=%ld RefCount=%s\n",
DPxPTR((uintptr_t)HstPtrBegin), Size, (UpdateRefCount ? " updated" : ""));		DPxPTR((uintptr_t)HstPtrBegin), Size, (UpdateRefCount ? " updated" : ""));
IsHostPtr = true;		IsHostPtr = true;
rc = HstPtrBegin;		rc = HstPtrBegin;
} else {		} else {
// If it is not contained and Size > 0 we should create a new entry for it.		// If it is not contained and Size > 0 we should create a new entry for it.
IsNew = true;		IsNew = true;
uintptr_t tp = (uintptr_t)RTL->data_alloc(RTLDeviceID, Size, HstPtrBegin);		uintptr_t tp =
		(uintptr_t)MemoryManager.Allocate(Size, HstPtrBegin, DeviceID);
		jdoerfertUnsubmitted Done Reply Inline Actions Nit: make `tp` a `void ` and cast the one use of it as `uintptr_t` instead. jdoerfert:* Nit: make `tp` a `void *` and cast the one use of it as `uintptr_t` instead.
		tianshilei1992AuthorUnsubmitted Done Reply Inline Actions Unrelated to this patch so mark it as Done. tianshilei1992: Unrelated to this patch so mark it as Done.
DP("Creating new map entry: HstBase=" DPxMOD ", HstBegin=" DPxMOD ", "		DP("Creating new map entry: HstBase=" DPxMOD ", HstBegin=" DPxMOD ", "
"HstEnd=" DPxMOD ", TgtBegin=" DPxMOD "\n", DPxPTR(HstPtrBase),		"HstEnd=" DPxMOD ", TgtBegin=" DPxMOD "\n", DPxPTR(HstPtrBase),
DPxPTR(HstPtrBegin), DPxPTR((uintptr_t)HstPtrBegin + Size), DPxPTR(tp));		DPxPTR(HstPtrBegin), DPxPTR((uintptr_t)HstPtrBegin + Size), DPxPTR(tp));
HostDataToTargetMap.push_front(HostDataToTargetTy((uintptr_t)HstPtrBase,		HostDataToTargetMap.push_front(HostDataToTargetTy((uintptr_t)HstPtrBase,
(uintptr_t)HstPtrBegin, (uintptr_t)HstPtrBegin + Size, tp));		(uintptr_t)HstPtrBegin, (uintptr_t)HstPtrBegin + Size, tp));
rc = (void *)tp;		rc = (void *)tp;
}		}
}		}
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	int DeviceTy::deallocTgtPtr(void *HstPtrBegin, int64_t Size, bool ForceDelete,
LookupResult lr = lookupMapping(HstPtrBegin, Size);		LookupResult lr = lookupMapping(HstPtrBegin, Size);
if (lr.Flags.IsContained \|\| lr.Flags.ExtendsBefore \|\| lr.Flags.ExtendsAfter) {		if (lr.Flags.IsContained \|\| lr.Flags.ExtendsBefore \|\| lr.Flags.ExtendsAfter) {
auto &HT = *lr.Entry;		auto &HT = *lr.Entry;
if (ForceDelete)		if (ForceDelete)
HT.resetRefCount();		HT.resetRefCount();
if (HT.decRefCount() == 0) {		if (HT.decRefCount() == 0) {
DP("Deleting tgt data " DPxMOD " of size %ld\n",		DP("Deleting tgt data " DPxMOD " of size %ld\n",
DPxPTR(HT.TgtPtrBegin), Size);		DPxPTR(HT.TgtPtrBegin), Size);
RTL->data_delete(RTLDeviceID, (void *)HT.TgtPtrBegin);		MemoryManager.Free((void *)HT.TgtPtrBegin, DeviceID);
		jdoerfertUnsubmitted Done Reply Inline Actions Nit: Remove the cast. jdoerfert: Nit: Remove the cast.
		tianshilei1992AuthorUnsubmitted Done Reply Inline Actions Unrelated to this patch so mark it as Done. tianshilei1992: Unrelated to this patch so mark it as Done.
DP("Removing%s mapping with HstPtrBegin=" DPxMOD ", TgtPtrBegin=" DPxMOD		DP("Removing%s mapping with HstPtrBegin=" DPxMOD ", TgtPtrBegin=" DPxMOD
", Size=%ld\n", (ForceDelete ? " (forced)" : ""),		", Size=%ld\n", (ForceDelete ? " (forced)" : ""),
DPxPTR(HT.HstPtrBegin), DPxPTR(HT.TgtPtrBegin), Size);		DPxPTR(HT.HstPtrBegin), DPxPTR(HT.TgtPtrBegin), Size);
HostDataToTargetMap.erase(lr.Entry);		HostDataToTargetMap.erase(lr.Entry);
}		}
rc = OFFLOAD_SUCCESS;		rc = OFFLOAD_SUCCESS;
} else {		} else {
DP("Section to delete (hst addr " DPxMOD ") does not exist in the allocated"		DP("Section to delete (hst addr " DPxMOD ") does not exist in the allocated"
Show All 12 Lines	if (RTL->init_requires)
RTL->init_requires(RTLs->RequiresFlags);		RTL->init_requires(RTLs->RequiresFlags);
int32_t rc = RTL->init_device(RTLDeviceID);		int32_t rc = RTL->init_device(RTLDeviceID);
if (rc == OFFLOAD_SUCCESS) {		if (rc == OFFLOAD_SUCCESS) {
IsInit = true;		IsInit = true;
}		}
}		}

/// Thread-safe method to initialize the device only once.		/// Thread-safe method to initialize the device only once.
int32_t DeviceTy::initOnce() {		int32_t DeviceTy::initOnce() {
		ye-luoUnsubmitted Done Reply Inline Actions I think this is your real default. The default value of SizeThreshold always gets overwritten. ye-luo: I think this is your real default. The default value of SizeThreshold always gets overwritten.
		tianshilei1992AuthorUnsubmitted Done Reply Inline Actions Yes. The logic is a little weird. I'll refactor this part. tianshilei1992: Yes. The logic is a little weird. I'll refactor this part.
std::call_once(InitFlag, &DeviceTy::init, this);		std::call_once(InitFlag, &DeviceTy::init, this);

// At this point, if IsInit is true, then either this thread or some other		// At this point, if IsInit is true, then either this thread or some other
// thread in the past successfully initialized the device, so we can return		// thread in the past successfully initialized the device, so we can return
// OFFLOAD_SUCCESS. If this thread executed init() via call_once() and it		// OFFLOAD_SUCCESS. If this thread executed init() via call_once() and it
// failed, return OFFLOAD_FAIL. If call_once did not invoke init(), it means		// failed, return OFFLOAD_FAIL. If call_once did not invoke init(), it means
// that some other thread already attempted to execute init() and if IsInit		// that some other thread already attempted to execute init() and if IsInit
// is still false, return OFFLOAD_FAIL.		// is still false, return OFFLOAD_FAIL.
if (IsInit)		if (IsInit)
return OFFLOAD_SUCCESS;		return OFFLOAD_SUCCESS;
else		else
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}

// Load binary to device.		// Load binary to device.
__tgt_target_table DeviceTy::load_binary(void Img) {		__tgt_target_table DeviceTy::load_binary(void Img) {
RTL->Mtx.lock();		RTL->Mtx.lock();
__tgt_target_table *rc = RTL->load_binary(RTLDeviceID, Img);		__tgt_target_table *rc = RTL->load_binary(RTLDeviceID, Img);
RTL->Mtx.unlock();		RTL->Mtx.unlock();
return rc;		return rc;
}		}

// Submit data to device		// Submit data to device
		ye-luoUnsubmitted Done Reply Inline Actions Prefer else return RTL->data_delete(RTLDeviceID, TgtPtrBegin); the same change to RTL->data_alloc above ye-luo: Prefer ``` else return RTL->data_delete(RTLDeviceID, TgtPtrBegin); ``` the same change to RTL…
		tianshilei1992AuthorUnsubmitted Done Reply Inline Actions It's a code style preference. I would go with "no else after return". tianshilei1992: It's a code style preference. I would go with "no else after return".
int32_t DeviceTy::data_submit(void TgtPtrBegin, void HstPtrBegin,		int32_t DeviceTy::data_submit(void TgtPtrBegin, void HstPtrBegin,
int64_t Size, __tgt_async_info *AsyncInfoPtr) {		int64_t Size, __tgt_async_info *AsyncInfoPtr) {
if (!AsyncInfoPtr \|\| !RTL->data_submit_async \|\| !RTL->synchronize)		if (!AsyncInfoPtr \|\| !RTL->data_submit_async \|\| !RTL->synchronize)
return RTL->data_submit(RTLDeviceID, TgtPtrBegin, HstPtrBegin, Size);		return RTL->data_submit(RTLDeviceID, TgtPtrBegin, HstPtrBegin, Size);
else		else
return RTL->data_submit_async(RTLDeviceID, TgtPtrBegin, HstPtrBegin, Size,		return RTL->data_submit_async(RTLDeviceID, TgtPtrBegin, HstPtrBegin, Size,
AsyncInfoPtr);		AsyncInfoPtr);
}		}
▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

openmp/libomptarget/src/memory.h

This file was added.

				//===----------- memory.h - Target independent memory manager -------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// Declarations for target independent memory manager.
				//
				//===----------------------------------------------------------------------===//

				#pragma once

				#include <cstddef>

				// Forward declaration
				struct DeviceTy;

				class MemoryManagerTy {
				public:
				// Allocate memory from device DeviceId
				jdoerfertUnsubmitted Done Reply Inline Actions If there is no private state I'd go for struct. Though I would have expected private state TBH. jdoerfert: If there is no private state I'd go for struct. Though I would have expected private state TBH.
				void Allocate(size_t Size, void HstPtr, int DeviceId);

				// Free memory on device DeviceId
				int Free(void *Ptr, int DeviceId);

				// Initialize a manager with D
				void Init(DeviceTy &D);

				// The number of devices it manages
				size_t NumOfDevices();
				jdoerfertUnsubmitted Done Reply Inline Actions Describe what `Threshold` does (in some detail) jdoerfert: Describe what `Threshold` does (in some detail)
				};

				extern MemoryManagerTy MemoryManager;
				jdoerfertUnsubmitted Done Reply Inline Actions I think the MemoryManager, like the StreamManager, is a thing that belongs to a Device. Different devices might choose different implementations etc. That also reduced our global state footprint. Note that you can and should keep the memory.{h,cpp} files, but make the object part of a Device if possible. jdoerfert: I think the MemoryManager, like the StreamManager, is a thing that belongs to a Device.

openmp/libomptarget/src/memory.cpp

This file was added.

//===----------- memory.cpp - Target independent memory manager -----------===//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

// Functionality for managing target memory.

//===----------------------------------------------------------------------===//

jdoerfertUnsubmitted

Done

Can you add description of the algorithm here please. What is happening and why.

jdoerfert: Can you add description of the algorithm here please. What is happening and why.

#include <algorithm>

#include <cassert>

#include <list>

#include <memory>

#include <mutex>

#include <unordered_map>

#include <vector>

#include "device.h"

#include "memory.h"

#include "rtl.h"

namespace impl {

jdoerfertUnsubmitted

Done

The last sentence is now obsolete. I'd just state that there is an environment variable to set the threshold for which we will manage allocations. Please actually put the name of the variable here ;).

jdoerfert: The last sentence is now obsolete. I'd just state that there is an environment variable to set…

constexpr const size_t BucketSize[] = {

0, 1U << 20, 1U << 21, 1U << 22, 1U << 23, 1U << 24, 1U << 25,

1U << 26, 1U << 27, 1U << 28, 1U << 29, 1U << 30, 1U << 31};

constexpr const size_t NumBuckets = sizeof(BucketSize) / sizeof(size_t);

// Find the previous number that is power of 2 given a number

inline size_t Flp2(size_t Num) {

JonChesterfieldUnsubmitted

Done

This maps:
0 -> 0
1 -> 1
2 -> 2
3 -> 2
4 -> 4
which is not the previous power of two. Round down to a power of two could be:
x < 2 ? x : 1 << (31 - __builtin_clz(x - 1))

JonChesterfield: This maps: 0 -> 0 1 -> 1 2 -> 2 3 -> 2 4 -> 4 which is not the previous power of two. Round…

tianshilei1992AuthorUnsubmitted

Done

That is actually what my expectation. This function is for a number that is not a power of 2. The comment is not accurate, and I'll update it.
The intention here is to distribute different buffers to different buckets based on its previous power of two. For example, 1024, 1025, 1100, 2000 will all go to the bucket with size 1024.

tianshilei1992: That is actually what my expectation. This function is for a number that is not a power of 2.

Num |= Num >> 1;

Num |= Num >> 2;

Num |= Num >> 4;

jdoerfertUnsubmitted

Done

1U << 8, 1U << 9, 1U << 10, 1U << 11, 1U << 12, 1U << 13};

- constexpr const int NumBuckets = sizeof(BucketSize) / sizeof(size_t);

+ constexpr const int NumBuckets = sizeof(BucketSize) / sizeof(BucketSize[0]);

/// The threshold to manage memory using memory manager

jdoerfert:

Num |= Num >> 8;

Num |= Num >> 16;

Num |= Num >> 32;

jdoerfertUnsubmitted

Done

/// The threshold to manage memory using memory manager

- size_t SizeThreshold = BucketSize[NumBuckets - 1];

+ static size_t SizeThreshold = BucketSize[NumBuckets - 1];

/// Find the previous number that is power of 2 given a number that is not power

jdoerfert:

Num += 1;

return Num >> 1;

}

// Find a suitable bucket

inline int FindBucket(size_t Size) {

const size_t F = Flp2(Size);

int L = 0, H = NumBuckets - 1;

while (H - L > 1) {

int M = (L + H) >> 1;

if (BucketSize[M] == F)

return M;

if (BucketSize[M] > F)

H = M - 1;

else

L = M;

}

JonChesterfieldUnsubmitted

Not Done

Tests for this arithmetic?

JonChesterfield: Tests for this arithmetic?

assert(L >= 0 && L < NumBuckets && "L is out of range");

return L;

}

jdoerfertUnsubmitted

Done

No inline but static please. Same above. This feels like something we could use from llvm other share with libomp... this code duplication is a nightmare.

Anyway, as @JonChesterfield mentioned, we should aim for a unit test here. We could also add a executable test case that hits a bucket really hard to ensure we can deal with it.

jdoerfert: No inline but static please. Same above. This feels like something we could use from llvm other…

struct Node {

// If Size is zero, this node has not been connected with a target memory

size_t Size;

void *Ptr;

Node(size_t S, void *P) : Size(S), Ptr(P) {}

jdoerfertUnsubmitted

Done

-inline +static

jdoerfert: `-inline` `+static`

tianshilei1992AuthorUnsubmitted

Done

I didn't get that. Why does inline not work here? This function is so simple such that I would like to see it is inlined by the compiler.

tianshilei1992: I didn't get that. Why does `inline` not work here? This function is so simple such that I…

jdoerfertUnsubmitted

Done

inline is two things: a "hint" which affects the inliner heuristic and a way to get linkonce_odr linkage for functions. It is not a way to force inlining, that is __attribute__((always_inline)). That said, there is no need to tell the inliner what to do anyway but always limit the lifetime of things, so make them static if possible. Take a look at https://godbolt.org/z/Mjnhe8 to see the effects different annotations have.

jdoerfert: `inline` is two things: a "hint" which affects the inliner heuristic and a way to get…

};

jdoerfertUnsubmitted

Done

Function and member comments in doxygen style please. Also in the header.

Add a static assert that the node size is 2 * sizeof(void*).

jdoerfert: Function and member comments in doxygen style please. Also in the header. --- Add a static…

jdoerfertUnsubmitted

Not Done

Can we rename flp2 into findPreviousPowerOfTwo or something similarly descriptive?

jdoerfert: Can we rename `flp2` into `findPreviousPowerOfTwo` or something similarly descriptive?

class MemoryManagerTy {

std::list<Node *> FreeList[NumBuckets];

jdoerfertUnsubmitted

Done

Comment explaining this thing. I'm also very confused by the duplicated class declaration. Let's not do that.

jdoerfert: Comment explaining this thing. I'm also very confused by the duplicated class declaration.

tianshilei1992AuthorUnsubmitted

Done

Not duplicate. I just don't want to put too many things into a header that will be included by others. Use PImpl could make things better.

tianshilei1992: Not duplicate. I just don't want to put too many things into a header that will be included by…

std::list<Node> NodeList;

JonChesterfieldUnsubmitted

Done

Why list over smallvector? I can't see a need for iterator stability here

JonChesterfield: Why list over smallvector? I can't see a need for iterator stability here

tianshilei1992AuthorUnsubmitted

Done

Any good suggestion? I also think this style is a little weird, but cannot find a better one.

tianshilei1992: Any good suggestion? I also think this style is a little weird, but cannot find a better one.

std::unordered_map<void *, Node *> PtrToNodeTable;

std::mutex FreeListLocks[NumBuckets];

std::mutex MapTableLock;

std::mutex NodeListLock;

DeviceTy *Device;

jdoerfertUnsubmitted

Done

Comments explaining these things. Maybe place the mutexes next to the things they protext.

Why a std::list and an unordered map?

Naturally, I would have gone with a vector or std::deque. To "delete" elements I would mark them taken. There should be 32bit padding in a Node anyway. Though hard to predict what is good.

I am unsure about the map, w/o measurements its guesswork and I would go with the regular one but this is fine.

jdoerfert: Comments explaining these things. Maybe place the mutexes next to the things they protext. Why…

void *allocateFromDevice(size_t Size, void *HstPtr) {

return Device->RTL->data_alloc(Device->RTLDeviceID, Size, HstPtr);

}

int deleteFromDevice(void *Ptr) {

return Device->RTL->data_delete(Device->RTLDeviceID, Ptr);

}

jdoerfertUnsubmitted

Done

Comments on all of these please. Maye allocateOnDevice as name instead?

jdoerfert: Comments on all of these please. Maye `allocateOnDevice` as name instead?

// This function is called when it tries to allocate memory on device but the

// device returns out of memory. It will first free one memory from the last

// bucket (because its buffers are large then chances are that we just need to

// free one or two) and try to allocate again until either we can get memory

// or every buffer we hold has been freed.

void *freeAndAllocate(size_t Size, void *HstPtr) {

jdoerfertUnsubmitted

Done

Hm.. running out of memory seems like a "edge case" and if it happens it seems "likely" it happens again. Why not use the opportunity to free everything in the free list while we are here. I mean, it will be "cheaper", complexity wise, reasonably useful given that the next allocation will hit the same problem, and very much simpler.

jdoerfert: Hm.. running out of memory seems like a "edge case" and if it happens it seems "likely" it…

for (int I = NumBuckets; I >= 0; --I) {

std::list<Node *> &List = FreeList[I];

std::mutex &Lock = FreeListLocks[I];

JonChesterfieldUnsubmitted

Done

This is quadratic - each pass around the loop walks through each node of the list

JonChesterfield: This is quadratic - each pass around the loop walks through each node of the list

tianshilei1992AuthorUnsubmitted

Done

In the worse case, yes. The worse case is equivalent to release all free buffers. That's why this procedure starts from the bucket with largest size. Each time we release one buffer, we will try allocation once, until the allocation succeeds.

tianshilei1992: In the worse case, yes. The worse case is equivalent to release all free buffers. That's why…

do {

Node *P = nullptr;

// Fetch one node from the list

{

std::lock_guard<std::mutex> G(Lock);

// We have drained this bucket. Move to the next one.

if (List.empty())

break;

P = List.front();

List.pop_front();

}

// Call device routine to free the buffer

int Ret = deleteFromDevice(P->Ptr);

// Cannot free memory on device

// TODO: Maybe should raise an expcetion?

if (Ret != OFFLOAD_SUCCESS)

JonChesterfieldUnsubmitted

Done

LLVM is built with exceptions disabled, so probably shouldn't raise here

JonChesterfield: LLVM is built with exceptions disabled, so probably shouldn't raise here

jdoerfertUnsubmitted

Done

This is the runtime, so exceptions "would work". However, no exceptions please. There is no defined interface and no reason to believe the user has a C++ exception handler waiting.

jdoerfert: This is the runtime, so exceptions "would work". However, no exceptions please. There is no…

return nullptr;

// Clear the node

P->Ptr = nullptr;

P->Size = 0;

// Swap it to the front of NodeList for later reuse

{

std::list<Node>::iterator Itr = NodeList.begin();

std::lock_guard<std::mutex> G(NodeListLock);

// Find the first empty node

// TODO: There can be some optimization

while (Itr->Ptr && &(*Itr) != P)

++Itr;

std::swap(*Itr, *P);

}

// Call device routine to try to allocate again

void *DevPtr = allocateFromDevice(Size, HstPtr);

if (DevPtr)

return DevPtr;

} while (1);

}

return nullptr;

}

public:

MemoryManagerTy(DeviceTy &D) : Device(&D) {}

~MemoryManagerTy() {

// TODO: There is a little issue that target plugin is destroyed before this

// object, therefore the memory free will not succeed.

for (Node &N : NodeList)

if (N.Ptr)

deleteFromDevice(N.Ptr);

}

jdoerfertUnsubmitted

Done

If this is part of the device, the place we tear down the context, this issue should go away, I think.

jdoerfert: If this is part of the device, the place we tear down the context, this issue should go away, I…

tianshilei1992AuthorUnsubmitted

Done

Comments seems out of date.

tianshilei1992: Comments seems out of date.

void *allocate(size_t Size, void *HstPtr) {

assert(Size && "Size is zero");

jdoerfertUnsubmitted

Done

At least for malloc and friends that is totally fine btw. If we filter this earlier we can leave the assert though.

jdoerfert: At least for malloc and friends that is totally fine btw. If we filter this earlier we can…

tianshilei1992AuthorUnsubmitted

Done

Comments seems out of date.

tianshilei1992: Comments seems out of date.

void *DevPtr = nullptr;

Node *P = nullptr;

const int B = FindBucket(Size);

jdoerfertUnsubmitted

Not Done

deleteOnDevice(N->Ptr);

- FreeLists[I].clear();

+ List.clear();

}

// Try allocate memory again

jdoerfert:

std::list<Node *> &List = FreeList[B];

jdoerfertUnsubmitted

Done

Descriptive variable names are worth the trouble typing.

jdoerfert: Descriptive variable names are worth the trouble typing.

std::mutex &Lock = FreeListLocks[B];

std::list<Node *>::iterator Itr = List.begin();

Lock.lock();

while (Itr != List.end() && (*Itr)->Size != Size)

++Itr;

// No available one

if (Itr == List.end()) {

Lock.unlock();

// Allocate one from device

DevPtr = allocateFromDevice(Size, HstPtr);

jdoerfertUnsubmitted

Done

We should round the size up to increase reuse. Also makes all blocks in a bucket the same size.

jdoerfert: We should round the size up to increase reuse. Also makes all blocks in a bucket the same size.

tianshilei1992AuthorUnsubmitted

Done

Comments seems out of date.

tianshilei1992: Comments seems out of date.

// If device is OOM, call freeAndAllocate to free some memory in FreeList

// and then allocate again

if (DevPtr == nullptr)

DevPtr = freeAndAllocate(Size, HstPtr);

// Something is wrong

// TODO: Should raise an exception?

if (DevPtr == nullptr)

return nullptr;

jdoerfertUnsubmitted

Done

Nothing is wrong, we just run OOM, return a nullptr and all is good.

jdoerfert: Nothing is wrong, we just run OOM, return a `nullptr` and all is good.

{

std::lock_guard<std::mutex> G(NodeListLock);

// There is no empty node in the NodeList. Create a new one.

if (NodeList.empty() || NodeList.front().Size != 0)

NodeList.emplace_back(Size, DevPtr);

else {

Node EmptyNode(std::move(NodeList.front()));

NodeList.pop_front();

EmptyNode.Size = Size;

EmptyNode.Ptr = DevPtr;

NodeList.push_back(std::move(EmptyNode));

}

P = &NodeList.back();

}

} else {

jdoerfertUnsubmitted

Done

As the lock is released, this can/should go into a helper function.

jdoerfert: As the lock is released, this can/should go into a helper function.

tianshilei1992AuthorUnsubmitted

Done

Comments seems out of date.

tianshilei1992: Comments seems out of date.

DevPtr = (*Itr)->Ptr;

P = *Itr;

List.erase(Itr);

Lock.unlock();

}

{

std::lock_guard<std::mutex> G(MapTableLock);

PtrToNodeTable[DevPtr] = P;

}

return DevPtr;

}

int free(void *DevPtr) {

Node *P = PtrToNodeTable[DevPtr];

// Remove this item from the map table

{

std::lock_guard<std::mutex> G(MapTableLock);

PtrToNodeTable.erase(DevPtr);

}

// Insert the node to the free list

const int B = FindBucket(P->Size);

std::list<Node *> &List = FreeList[B];

std::lock_guard<std::mutex> G(FreeListLocks[B]);

List.push_back(P);

return OFFLOAD_SUCCESS;

}

};

std::vector<std::shared_ptr<MemoryManagerTy>> MemoryManagers;

inline bool isValidDeviceId(int DeviceId) {

return DeviceId >= 0 && static_cast<size_t>(DeviceId) < MemoryManagers.size();

}

tianshilei1992AuthorUnsubmitted

Done

This line should be removed.

tianshilei1992: This line should be removed.

} // namespace impl

void MemoryManagerTy::Init(DeviceTy &D) {

impl::MemoryManagers.emplace_back(std::make_shared<impl::MemoryManagerTy>(D));

}

void *MemoryManagerTy::Allocate(size_t Size, void *HstPtr, int DeviceId) {

assert(impl::isValidDeviceId(DeviceId) && "Invalid DeviceId");

return impl::MemoryManagers[DeviceId]->allocate(Size, HstPtr);

}

int MemoryManagerTy::Free(void *DevPtr, int DeviceId) {

assert(impl::isValidDeviceId(DeviceId) && "Invalid DeviceId");

jdoerfertUnsubmitted

Not Done

This pattern occurs at least twice, might be worth to put it in a helper method, e.g., allocateOrFreeAndAllocate for the lack of a better name ;)

jdoerfert: This pattern occurs at least twice, might be worth to put it in a helper method, e.g.

return impl::MemoryManagers[DeviceId]->free(DevPtr);

}

size_t MemoryManagerTy::NumOfDevices() { return impl::MemoryManagers.size(); }

jdoerfertUnsubmitted

Not Done

This message should be more descriptive I guess. "Return nullptr" is not helpful. Maybe spell out that we failed to allocate the requested memory, the device might be OOM. I guess this is also a good spot for some debugger events eventually...

jdoerfert: This message should be more descriptive I guess. "Return nullptr" is not helpful. Maybe spell…

MemoryManagerTy MemoryManager;

openmp/libomptarget/src/omptarget.cpp

//===------ omptarget.cpp - Target independent OpenMP target RTL -- C++ -*-===//		//===------ omptarget.cpp - Target independent OpenMP target RTL -- C++ -*-===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// Implementation of the interface to be used by Clang during the codegen of a		// Implementation of the interface to be used by Clang during the codegen of a
// target region.		// target region.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include <omptarget.h>		#include <omptarget.h>

#include "device.h"		#include "device.h"
		#include "memory.h"
#include "private.h"		#include "private.h"
#include "rtl.h"		#include "rtl.h"

#include <cassert>		#include <cassert>
#include <vector>		#include <vector>

#ifdef OMPTARGET_DEBUG		#ifdef OMPTARGET_DEBUG
int DebugLevel = 0;		int DebugLevel = 0;
▲ Show 20 Lines • Show All 688 Lines • ▼ Show 20 Lines	for (int32_t i = 0; i < arg_num; ++i) {
bool IsLast, IsHostPtr; // unused.		bool IsLast, IsHostPtr; // unused.
if (arg_types[i] & OMP_TGT_MAPTYPE_LITERAL) {		if (arg_types[i] & OMP_TGT_MAPTYPE_LITERAL) {
DP("Forwarding first-private value " DPxMOD " to the target construct\n",		DP("Forwarding first-private value " DPxMOD " to the target construct\n",
DPxPTR(HstPtrBase));		DPxPTR(HstPtrBase));
TgtPtrBegin = HstPtrBase;		TgtPtrBegin = HstPtrBase;
TgtBaseOffset = 0;		TgtBaseOffset = 0;
} else if (arg_types[i] & OMP_TGT_MAPTYPE_PRIVATE) {		} else if (arg_types[i] & OMP_TGT_MAPTYPE_PRIVATE) {
// Allocate memory for (first-)private array		// Allocate memory for (first-)private array
TgtPtrBegin = Device.RTL->data_alloc(Device.RTLDeviceID,		TgtPtrBegin =
arg_sizes[i], HstPtrBegin);		MemoryManager.Allocate(arg_sizes[i], HstPtrBegin, Device.DeviceID);
if (!TgtPtrBegin) {		if (!TgtPtrBegin) {
DP ("Data allocation for %sprivate array " DPxMOD " failed, "		DP ("Data allocation for %sprivate array " DPxMOD " failed, "
"abort target.\n",		"abort target.\n",
(arg_types[i] & OMP_TGT_MAPTYPE_TO ? "first-" : ""),		(arg_types[i] & OMP_TGT_MAPTYPE_TO ? "first-" : ""),
DPxPTR(HstPtrBegin));		DPxPTR(HstPtrBegin));
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}
fpArrays.push_back(TgtPtrBegin);		fpArrays.push_back(TgtPtrBegin);
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	#endif
}		}
if (rc != OFFLOAD_SUCCESS) {		if (rc != OFFLOAD_SUCCESS) {
DP ("Executing target region abort target.\n");		DP ("Executing target region abort target.\n");
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}

// Deallocate (first-)private arrays		// Deallocate (first-)private arrays
for (auto it : fpArrays) {		for (auto it : fpArrays) {
int rt = Device.RTL->data_delete(Device.RTLDeviceID, it);		int rt = MemoryManager.Free(it, Device.DeviceID);
if (rt != OFFLOAD_SUCCESS) {		if (rt != OFFLOAD_SUCCESS) {
DP("Deallocation of (first-)private arrays failed.\n");		DP("Deallocation of (first-)private arrays failed.\n");
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}
}		}

// Move data from device.		// Move data from device.
int rt = target_data_end(Device, arg_num, args_base, args, arg_sizes,		int rt = target_data_end(Device, arg_num, args_base, args, arg_sizes,
Show All 11 Lines

openmp/libomptarget/src/rtl.cpp

//===----------- rtl.cpp - Target independent OpenMP target RTL -----------===//		//===----------- rtl.cpp - Target independent OpenMP target RTL -----------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// Functionality for handling RTL plugins.		// Functionality for handling RTL plugins.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "device.h"		#include "device.h"
		#include "memory.h"
#include "private.h"		#include "private.h"
#include "rtl.h"		#include "rtl.h"

#include <cassert>		#include <cassert>
#include <cstdlib>		#include <cstdlib>
#include <cstring>		#include <cstring>
#include <dlfcn.h>		#include <dlfcn.h>
#include <mutex>		#include <mutex>
▲ Show 20 Lines • Show All 269 Lines • ▼ Show 20 Lines	for (auto &R : AllRTLs) {
size_t start = Devices.size();		size_t start = Devices.size();
Devices.resize(start + R.NumberOfDevices, device);		Devices.resize(start + R.NumberOfDevices, device);
for (int32_t device_id = 0; device_id < R.NumberOfDevices;		for (int32_t device_id = 0; device_id < R.NumberOfDevices;
device_id++) {		device_id++) {
// global device ID		// global device ID
Devices[start + device_id].DeviceID = start + device_id;		Devices[start + device_id].DeviceID = start + device_id;
// RTL local device ID		// RTL local device ID
Devices[start + device_id].RTLDeviceID = device_id;		Devices[start + device_id].RTLDeviceID = device_id;
		// Initiliaze memory manager
		MemoryManager.Init(Devices[start + device_id]);
}		}

		assert(Devices.size() == MemoryManager.NumOfDevices() &&
		"Devices and MemoryManager have different size");

// Initialize the index of this RTL and save it in the used RTLs.		// Initialize the index of this RTL and save it in the used RTLs.
R.Idx = (UsedRTLs.empty())		R.Idx = (UsedRTLs.empty())
? 0		? 0
: UsedRTLs.back()->Idx + UsedRTLs.back()->NumberOfDevices;		: UsedRTLs.back()->Idx + UsedRTLs.back()->NumberOfDevices;
assert((size_t) R.Idx == start &&		assert((size_t) R.Idx == start &&
"RTL index should equal the number of devices used so far.");		"RTL index should equal the number of devices used so far.");
R.isUsed = true;		R.isUsed = true;
UsedRTLs.push_back(&R);		UsedRTLs.push_back(&R);
▲ Show 20 Lines • Show All 127 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Introduce target memory managerClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 268054

openmp/libomptarget/src/CMakeLists.txt

openmp/libomptarget/src/device.cpp

openmp/libomptarget/src/memory.h

openmp/libomptarget/src/memory.cpp

openmp/libomptarget/src/omptarget.cpp

openmp/libomptarget/src/rtl.cpp

[OpenMP] Introduce target memory manager
ClosedPublic