This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][libomptarget] Add support for unified memory for regular maps
ClosedPublic

Authored by gtbercea on Jul 19 2019, 9:14 AM.

Details

Summary

This patch adds support for using unified memory in the case of regular maps that happen when a target region is offloaded to the device.

For cases where only a single version of the data is required then the host address can be used. When variables need to be privatized in any way or globalized, then the copy to the device is still required for correctness.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
gtbercea marked 6 inline comments as done.Jul 22 2019, 1:51 PM
  • Address comments.
gtbercea marked 3 inline comments as done.Jul 22 2019, 1:52 PM
gtbercea added inline comments.
libomptarget/src/api.cpp
118–123 ↗(On Diff #210914)

I'm checking if the pointer is on the actual device. If unified memory is used then the pointers will match and the device present test will return false.
I have now refactored this check to make it more precise: if the host pointer is used then we have a flag for that.

libomptarget/src/device.cpp
290–291 ↗(On Diff #210914)

Fixed.

gtbercea marked an inline comment as done.Jul 22 2019, 1:52 PM
Hahnfeld requested changes to this revision.Jul 23 2019, 6:38 AM

I'd still like to see tests as mentioned in my last comment.

libomptarget/src/api.cpp
118–123 ↗(On Diff #210914)

Okay, I think I get what you're saying: You want to check that this particular pointer has been mapped to that specific device_num, right?

But with unified shared memory, this shouldn't matter, right? Because all pointers can be accessed from all devices, no?

libomptarget/src/device.cpp
207–215 ↗(On Diff #210914)

I still don't understand this.

This revision now requires changes to proceed.Jul 23 2019, 6:38 AM
gtbercea marked 2 inline comments as done.Jul 23 2019, 6:49 AM
gtbercea added inline comments.
libomptarget/src/api.cpp
118–123 ↗(On Diff #210914)

Employing the API functions allows the user to bypass the unified memory behaviour and these functions allow the user to manage device pointers explicitly.

libomptarget/src/device.cpp
207–215 ↗(On Diff #210914)

When use_device_ptr is employed the pointer is a true device pointer.

@Hahnfeld can you list the tests you would like to see please? And then I'll add them.

@Hahnfeld can you list the tests you would like to see please? And then I'll add them.

All new functionality should be tested unless there is a good reason it cannot:

Please add tests at least

  • for the API, ie that omp_target_alloc returns distinct memory (I suggest to omp_target_memcpy from host to allocated memory to a different host memory and change the first host memory before copying back).
  • that memory regions are indeed shared (in a target data region, the host should see updates from the host without an update and vice versa)

Additionally, I requested tests for omp_target_is_present inline.

gtbercea updated this revision to Diff 211501.Jul 24 2019, 7:26 AM
  • Add tests.
gtbercea updated this revision to Diff 211505.Jul 24 2019, 7:37 AM
  • Remove default device env var.
gtbercea updated this revision to Diff 211515.Jul 24 2019, 8:21 AM
  • Improve test.

@Hahnfeld I added several tests. Because these tests require unified memory to be supported by the underlying system I have added them as a new type of test: check-libomptarget-nvptx-unified (in addition to the check-libomptarget-nvptx one). These tests should only be run on platforms which support unified memory.

@Hahnfeld I added several tests. Because these tests require unified memory to be supported by the underlying system I have added them as a new type of test: check-libomptarget-nvptx-unified (in addition to the check-libomptarget-nvptx one). These tests should only be run on platforms which support unified memory.

Can we instead have the tests work with the host plugins? That would be much easier. (Sorry, don't have the time to look at the tests right now)

gtbercea updated this revision to Diff 211963.Jul 26 2019, 10:23 AM
  • Add tests.

@Hahnfeld I added several tests. Because these tests require unified memory to be supported by the underlying system I have added them as a new type of test: check-libomptarget-nvptx-unified (in addition to the check-libomptarget-nvptx one). These tests should only be run on platforms which support unified memory.

Can we instead have the tests work with the host plugins? That would be much easier. (Sorry, don't have the time to look at the tests right now)

Done!

Hahnfeld added inline comments.Jul 29 2019, 7:20 AM
libomptarget/test/api/api.c
9 ↗(On Diff #211963)

Please add a manual call to the runtime function such that this test works with older versions of Clang.

22 ↗(On Diff #211963)

Is there a particular reason to use the second device? Can we just take omp_get_default_device?

69 ↗(On Diff #211963)

Can we change the type to int to avoid any problems with floating point numbers?

77–79 ↗(On Diff #211963)

I'd suggest to make this different tests, and make clear that the test exercises unified_shared_memory (maybe a new directory)

82–88 ↗(On Diff #211963)

And I still don't understand why we need omp_target_is_present to return false here. What would be the disadvantage of saying "all data is present on all devices"?

98 ↗(On Diff #211963)

because this sounds really odd: A can be accessed on the device, but is not "present"?

libomptarget/test/offloading/requires_unified_shared_memory_local.c
17–18 ↗(On Diff #211963)

Can we make this int?

25–26 ↗(On Diff #211963)

Why are the pointers cast to long long? Can we just compare them as void *?

Hahnfeld requested changes to this revision.Jul 29 2019, 7:22 AM

Oh, and there's still no test for use_device_ptr

This revision now requires changes to proceed.Jul 29 2019, 7:22 AM
gtbercea marked 6 inline comments as done.Jul 29 2019, 8:01 AM
gtbercea added inline comments.
libomptarget/test/api/api.c
22 ↗(On Diff #211963)

No there isn't and yes we can.

69 ↗(On Diff #211963)

Sure.

82–88 ↗(On Diff #211963)

I said it and I'll say it again: when the user manually requests data to be allocated on the device, data must be allocated on the device. In this case the use of the omp_target_alloc will allocate data on the device even if unified shared memory is active.

98 ↗(On Diff #211963)

This is just a presence test using the OpenMP API. If A is not associated to a device instance then it won't be considered present. Again, if the user wants to handle all of this manually the option is there and unified memory can't make this not an option for the user.

libomptarget/test/offloading/requires_unified_shared_memory_local.c
17–18 ↗(On Diff #211963)

Yes

25–26 ↗(On Diff #211963)

Is there a problem I'm missing if they are cast to long long?

Hahnfeld added inline comments.Jul 29 2019, 8:25 AM
libomptarget/test/api/api.c
82–88 ↗(On Diff #211963)

I don't see the relation to what I asked: Sure, omp_target_alloc has to return fresh memory. But why does this matter for omp_target_is_present?

98 ↗(On Diff #211963)

If A is not associated to a device instance then it won't be considered present.

Can you link me to a paragraph in the spec that mandates this implication? I only know of the following from the Effect of omp_target_is_present:

This routine returns non-zero if the specified pointer would be found present on device device_num by a map clause; otherwise, it returns zero.

I think there's nothing that prevents a non-zero return value because in fact all pointers would be found present by a map clause under unified shared memory.

libomptarget/test/offloading/requires_unified_shared_memory_local.c
25–26 ↗(On Diff #211963)

Yes, casting to long long might truncate, or at least I'm not aware that it guarantees to be of the same size as pointers.

gtbercea marked an inline comment as done.Jul 29 2019, 8:37 AM
gtbercea added inline comments.
libomptarget/test/api/api.c
98 ↗(On Diff #211963)

You would be right to think so if this was not in combination with omp_target_alloc. The fact that it is it means that the pointer was not officially mapped (explicitely or implicitly).

Hahnfeld added inline comments.Jul 29 2019, 8:46 AM
libomptarget/test/api/api.c
98 ↗(On Diff #211963)

So what would be a full motivating example where libomptarget can't return true for any pointer passed to omp_target_is_present? Is it in this test?

grokos added inline comments.Jul 29 2019, 11:02 AM
libomptarget/src/device.cpp
160 ↗(On Diff #211963)

IsHostPtr is never initialized in this function, I think it's better to set it to false explicitly instead of relying on the assumption that the caller has set it to false. You can init it right after rc.

166–167 ↗(On Diff #211963)

This comma is confusing here, can you move it after "active"?

If unified shared memory is active, implicitly mapped variables that are not privatized use host address.

178 ↗(On Diff #211963)

This evaluates to just HstPtrBegin, the two HT.HstPtrBegin cancel out, so you can skip defining tp altogether.

237 ↗(On Diff #211963)

This init can be moved outside the locked-mutex region.

248 ↗(On Diff #211963)

Same here, just use HstPtrBegin instead of tp.

167 ↗(On Diff #211515)

explicitely --> explicitly

174–176 ↗(On Diff #211515)

I'm confused with this condition. What do we want to test?

The condition can be simplified to:

IsImplicit || !lr.Flags.IsContained || Size

because if Size==0, then lookupMapping cannot return ExtendsBefore/After, it may only return IsContained. So we want a mapping that's either

  1. implicit or
  2. explicit and it is either not contained or has Size>0.

The latter doesn't make sense to me... For instance, what if Size>0 and the mapping explicitly extends after? The condition will evaluate to true but this is invalid use no matter whether we use unified shared memory or not.

libomptarget/src/omptarget.cpp
247–249 ↗(On Diff #211963)

This is correct, with one little exception. Although the OpenMP standard does not mandate it, upstream clang supports use_device_ptr on pointers which are struct members. Because they are struct members, they are not marked with TARGET_PARAM (only the combined entry is considered a target parameter, not the individual members). On the other hand, they are marked with PTR_AND_OBJ and have some value in the MEMBER_OF bits.

Once again, it's a non-standard extension so we are free to decide whether to support it or not in the unified shared memory scenario.

259 ↗(On Diff #211963)

ne --> be

379–381 ↗(On Diff #211963)

Same here (if we decide to support the struct member case).

507 ↗(On Diff #211963)

unified shared memory

541 ↗(On Diff #211963)

unified shared memory

553–578 ↗(On Diff #211963)

I think we can skip the whole shadow pointer loop when we use unified shared memory.

libomptarget/test/offloading/requires_unified_shared_memory_local.c
25–26 ↗(On Diff #211963)

You can cast them to uintptr_t, it is a datatype guaranteed to be long enough to represent a pointer.

Hahnfeld added inline comments.Jul 29 2019, 11:23 AM
libomptarget/src/omptarget.cpp
247–249 ↗(On Diff #211963)

Can we please first answer my question why we need to care about the existence of use_device_ptr? Why does it make a difference for unified shared memory?

libomptarget/test/offloading/requires_unified_shared_memory_local.c
25–26 ↗(On Diff #211963)

Again: Why do we need to cast? We can just compare void *!

gtbercea marked an inline comment as done.Jul 29 2019, 11:42 AM
gtbercea added inline comments.
libomptarget/src/omptarget.cpp
247–249 ↗(On Diff #211963)

Because use_device_ptr implies the use of a true device pointer and again that needs to be respected even unified memory is used.

gtbercea marked an inline comment as done.Jul 29 2019, 11:47 AM
gtbercea added inline comments.
libomptarget/test/api/api.c
98 ↗(On Diff #211963)

Wait, this is not about a motivating example, it's about correctness. The function should return false if A is not "present" on the device. So I think the question is what is the definition of present in this case? As stated before I think in this case A is not actually mapped, perhaps the pointer to A may be implicitly mapped but the actual data belonging to A has not been associated with the pointer.

Hahnfeld added inline comments.Jul 29 2019, 11:53 AM
libomptarget/src/omptarget.cpp
247–249 ↗(On Diff #211963)

Where is that in the spec?

libomptarget/test/api/api.c
98 ↗(On Diff #211963)

But the data of A will not copied to the device by a map clause, right? So it's already present from that perspective.

gtbercea marked an inline comment as done.Jul 29 2019, 12:04 PM
gtbercea added inline comments.
libomptarget/src/omptarget.cpp
247–249 ↗(On Diff #211963)

The pointers returned by omp_target_alloc and accessed through use_device_ptr are guaranteed to be pointer values that can support pointer arithmetic while still being native device pointers. (Section 2.4 page 61 bottom).

Hahnfeld added inline comments.Jul 29 2019, 12:23 PM
libomptarget/src/omptarget.cpp
247–249 ↗(On Diff #211963)

With unified_shared_memory we get: (Section 2.4, page 62, lines 4:7)

Additionally, memory in the device data environment of any device visible to OpenMP, including but not limited to the host, is considered part of the device data environment of all devices accessible through OpenMP except as noted below. Every device address allocated through OpenMP device memory routines is a valid host pointer.

From my understanding, this implies all shared memory can be referenced by a host pointer. This guarantees pointer arithmetic per the C / C++ standard.

libomptarget/src/omptarget.cpp
247–249 ↗(On Diff #211963)

Experience from our users show a significant performance degradation when every data is mapped to host memory under unified. Overwhelmingly, users want specific data to be allocated on device & mapped, while at the same time they want generic memory used by the device to be host memory (e.g. for linked list structures).

Recognizing this important use case, the specs has introduced the "close" modifier to the map for users to indicate that the data being mapped should be set "close" to the device, aka for us allocated on the device. Similarly, if the user went through the additional exercise of using target alloc / free, we want to respect this explicit request of the user by allocating the data on the device.

This is why, even under unified memory model, some data is still mapped with duplicated copies between host and device. Thus requiring the maps & use_device_ptr to be tested and enforced when data was actually mapped.

Fortunately for the overhead, we know that only a very small subset of all data accessed will be labeled with "close" or allocated with target_alloc, so the cost of performing this mapping will be very significantly reduced.

Hahnfeld added inline comments.Jul 29 2019, 11:58 PM
libomptarget/src/omptarget.cpp
247–249 ↗(On Diff #211963)

Alex, I fully agree with all of your points: Yes, we need means to allocate memory on the device (omp_target_alloc and close) and this must be reflected by maps and use_device_ptr. I'm not arguing against this, the behavior of the two methods is pretty clear in the spec.

What I'm asking about is something different, though related: If we have "generic" memory (as you call it, so normal memory on the host that is "shared" with all devices), do we need to allocate device memory if the user specifies use_device_ptr? Because that is what the current patch will do in getOrAllocTgtPtr if IsInUseDevicePtrClause = true.

Let's discuss a concrete example: (assume that the TU has a requires unified_shared_memory)

int A[N];
// init ...
int *Aptr = &A[0];

#pragma omp target data use_device_ptr(A)
{
  // What is the value of A?
}

I think we should have A == Aptr, but this patch will allocate disjoint device memory. (And while writing this, I wonder how libomptarget knows about the size of A? According to the spec, the clause expects a list of pointers, not array sections. So how will this work with opaque pointers passed to an orphaned function?)

gtbercea marked an inline comment as done.Jul 30 2019, 8:59 AM
gtbercea added inline comments.
libomptarget/src/omptarget.cpp
247–249 ↗(On Diff #211963)

Your program returns the following errors:

test_maps.c:49:42: error: expected pointer or reference to pointer in 'use_device_ptr' clause

#pragma omp target data use_device_ptr(A)
                                       ^

test_maps.c:49:3: error: expected at least one 'map' or 'use_device_ptr' clause for '#pragma omp target data'

#pragma omp target data use_device_ptr(A)
Hahnfeld added inline comments.Jul 30 2019, 9:54 AM
libomptarget/src/omptarget.cpp
247–249 ↗(On Diff #211963)

That happens if I don't compile test the small snippets I post :-/ either make A a heap pointer or use Aptr in the clause, that should work (though I can't test right now)

gtbercea updated this revision to Diff 212423.Jul 30 2019, 1:14 PM
  • Remove use device ptr flag.
gtbercea updated this revision to Diff 212425.Jul 30 2019, 1:31 PM
gtbercea marked 13 inline comments as done.
  • Address comments.
libomptarget/src/omptarget.cpp
247–249 ↗(On Diff #211963)

So I think I had misunderstood the spec the first time around, I have now removed the special treatment of the use_device_ptr. The host address should be preserved now.

gtbercea marked 3 inline comments as done.Jul 30 2019, 1:33 PM
gtbercea added inline comments.Jul 30 2019, 1:33 PM
libomptarget/src/device.cpp
178 ↗(On Diff #211963)

I tried and it leads to failing tests when handling structs.

gtbercea updated this revision to Diff 212436.Jul 30 2019, 2:06 PM
gtbercea marked an inline comment as done.
  • Move tests to new folder.
  • Fix tests.
  • Move requires test.

@Hahnfeld @grokos new version of the patch with addressed comments.

@Hahnfeld I cannot replace the usage of the requires pragma with the call to the registration function directly because the registration function is the first function that gets called so if I explicitely invoke it will only be the 2nd call to that function. A subsequent call with a different set of flags will lead to a mismatch in requires clauses error. (first implicit call is without unified_shared_memory and the second call is with unified_shared_memory).

@Hahnfeld I cannot replace the usage of the requires pragma with the call to the registration function directly because the registration function is the first function that gets called so if I explicitely invoke it will only be the 2nd call to that function. A subsequent call with a different set of flags will lead to a mismatch in requires clauses error. (first implicit call is without unified_shared_memory and the second call is with unified_shared_memory).

Yes, but you can have both, the requires directive and the manual call. I understand that this is not needed for newer Clang versions, but right now the test will fail for older releases that don't support requires.


For the tests, I think we should keep requires.c in test/offloading/ because it's not specific to unified_shared_memory. Can you also rename requires_unified_shared_memory_local.c to shared_update.c (or something like that) to make the name more concise?

libomptarget/src/api.cpp
118–123 ↗(On Diff #210914)

I'm still not sure about this, what do others think?

libomptarget/src/device.cpp
290–291 ↗(On Diff #210914)

Is deallocTgtPtr still called under unified_shared_memory?

174–176 ↗(On Diff #211515)

I agree with @grokos here.

178 ↗(On Diff #211963)

Again I agree with the (static) analysis by @grokos. If that leads to failing tests, we need to understand why. What was the idea behind the calculation of tp?

207–213 ↗(On Diff #212436)

use_device_ptr is no more, so this change shouldn't be needed anymore?

libomptarget/src/omptarget.cpp
409–418 ↗(On Diff #212436)

We can just move the error check into the branch, or even better move the check for unified_shared_memory to the outer condition.

511–512 ↗(On Diff #212436)

Do we need the shadow pointers under unified_shared_memory and if TgtPtrBegin == HstPtrBegin? If not, we can just move the check to the outer condition.

544–545 ↗(On Diff #212436)

As above, but I think this needs to check TgtPtrBegin == HstPtrBegin.

I think we can just do the following around line 490:

} else if (Device.RTLRequiresFlags & OMP_REQ_UNIFIED_SHARED_MEMORY &&
                TgtPtrBegin == HstPtrBegin) {
  DP("hst data:" DPxMOD " unified and shared, becomes a noop\n", DPxPTR(HstPtrBegin));
  continue;
}
693–703 ↗(On Diff #212436)

Can move the check Device.RTLRequiresFlags & OMP_REQ_UNIFIED_SHARED_MEMORY && TgtPtrBegin == HstPtrBegin to line 690, in a similar else branch behind !Pointer_TgtPtrBegin.

libomptarget/test/unified_shared_memory/api.c
14 ↗(On Diff #212436)

Please format the test.

gtbercea marked an inline comment as done.Jul 31 2019, 7:13 AM
gtbercea added inline comments.
libomptarget/src/api.cpp
118–123 ↗(On Diff #210914)

See Alex's answer below!

gtbercea marked an inline comment as done.Jul 31 2019, 7:16 AM
gtbercea added inline comments.
libomptarget/src/device.cpp
178 ↗(On Diff #211963)

The answer is simple: this is how it was done before. Look at the existing pointer calculation I see no reason why it should have to be simplified. The simplification you propose has nothing to do with unified memory so if you think it needs to be done then you should open it up as a separate issue in a separate patch.

Hahnfeld added inline comments.Jul 31 2019, 9:56 AM
libomptarget/src/api.cpp
118–123 ↗(On Diff #210914)

This doesn't even mention omp_target_is_present, so this is no answer.

gtbercea updated this revision to Diff 212622.Jul 31 2019, 10:32 AM
gtbercea marked 15 inline comments as done.
  • Rename test.
  • Revert file move. Simplify condition.
  • Address comments.
libomptarget/src/device.cpp
178 ↗(On Diff #211963)

I misread the comment. I think it's no problem doing the simplification.

gtbercea marked an inline comment as done.Jul 31 2019, 12:40 PM
gtbercea added inline comments.
libomptarget/src/api.cpp
118–123 ↗(On Diff #210914)

So the problem is that if I return true all the time for the presence test when unified memory is on, then there's no way to distinguish the case where I use omp_target_alloc() but I haven't associated the host pointer with the device pointer yet.

So in other words the presence test is always an "honest" device presence test: it will only return true when there is a host version and corresponding device duplicate version. I think that case is the more meaningful to cover since for the other cases you don't really need a presence test.

gtbercea marked an inline comment as done.Jul 31 2019, 1:09 PM
gtbercea added inline comments.
libomptarget/src/api.cpp
118–123 ↗(On Diff #210914)

I think your confusion is based on the fact that you assume that the presence test is for checking if a future usage of a variable in a device target region leads to a seg fault or not. And I agree that in the non-unified case this intent coincides with that of a map existing.
In the unified case that's not necessarily the case: you can use a variable on the device by referring to its host version without the need of a map existing. The presence test in that case will return false and that's fine.

@Hahnfeld I hope I have answered all your concerns. Please let me know if you have any other comments.

Hahnfeld requested changes to this revision.Aug 1 2019, 9:14 AM

There's still no call to __tgt_register_requires in the two tests, so I guess they won't pass with older versions of Clang.

libomptarget/src/device.cpp
238 ↗(On Diff #212622)

Hehe, casting a void * to uintptr_t and back to void *? Please remove :D

libomptarget/src/omptarget.cpp
545–548 ↗(On Diff #212622)

This can never happen, the loop will continue above (currently line 493).

libomptarget/test/unified_shared_memory/shared_update.c
20 ↗(On Diff #212622)

This test isn't formatted either.

This revision now requires changes to proceed.Aug 1 2019, 9:14 AM

There's still no call to __tgt_register_requires in the two tests, so I guess they won't pass with older versions of Clang.

ah I knew I was forgetting something :)

gtbercea updated this revision to Diff 212855.Aug 1 2019, 10:59 AM
  • Add manual for registering the requires flag.
  • Address comments.
gtbercea marked 3 inline comments as done.Aug 1 2019, 11:00 AM
gtbercea updated this revision to Diff 212885.Aug 1 2019, 12:50 PM
  • Clean-up.
Hahnfeld accepted this revision.Aug 1 2019, 11:59 PM

I'm fine with the last update.

This revision is now accepted and ready to land.Aug 1 2019, 11:59 PM
Hahnfeld added a comment.EditedAug 2 2019, 12:00 AM

Oh, but the tests again look odd. Please run them through clang-format.

grokos added inline comments.Aug 2 2019, 10:42 AM
libomptarget/src/device.cpp
174 ↗(On Diff #212885)

I still don't understand this condition (even in the simplified form). Can you explain what we are testing?

gtbercea updated this revision to Diff 213098.Aug 2 2019, 11:34 AM
  • Fix condition.
gtbercea marked an inline comment as done.Aug 2 2019, 11:43 AM
gtbercea added inline comments.
libomptarget/src/device.cpp
174 ↗(On Diff #212885)

The condition comes from the 1st and 3rd if conditions. I have updated the condition to its more explicit version.

But, from what I can tell, IsContained can never be true in this case, ExtendesBefore/After I think cannot be true either.

IsImplicit can vary, Size also.

So I would reduce the condition to:

(IsImplicit || Size)

Does this now make sense?

gtbercea marked an inline comment as done.Aug 2 2019, 11:58 AM
gtbercea added inline comments.
libomptarget/src/device.cpp
174 ↗(On Diff #212885)

Actually it should just be Size I think. Testing it now.

gtbercea updated this revision to Diff 213368.Aug 5 2019, 7:41 AM
  • Fix condition again.
gtbercea marked 2 inline comments as done.Aug 5 2019, 7:43 AM
gtbercea added inline comments.
libomptarget/src/device.cpp
174 ↗(On Diff #212885)

Condition has now been fixed. This is it in its simplest form.

gtbercea updated this revision to Diff 213369.Aug 5 2019, 8:12 AM
  • Fix condition to exclude already mapped device data.
gtbercea updated this revision to Diff 213411.Aug 5 2019, 11:10 AM
  • Format tests.

@grokos I have updated the condition. I have also amended the comment to better explain the conditions.

@grokos any thoughts on the updated condition?

grokos added inline comments.Aug 6 2019, 11:56 AM
libomptarget/src/device.cpp
174 ↗(On Diff #212885)

I still think the condition is not correct - it can lead to invalid mappings escaping the runtime check. E.g. what if we use unified memory and we try to explicitly "map more"? The condition will evaluate to true (since we try to extend the mapping, isContained will be false and Size will be > 0), but the mapping is illegal and should be caught by the runtime.

I believe the whole if branch this patch introduces should be removed. Instead, we must let the original code perform the checks and in the last else-branch (line 185 in the original code "else if (Size)" we should check whether or not we use unified shared memory; if yes, then simply return the host address, if no, then proceed with the allocation of memory on the device. In other words, let this function find out whether the requested mapping already exists on the device; if it does we are done, if it doesn't then find out whether we should allocate device memory or use the host version of data.

gtbercea updated this revision to Diff 213691.Aug 6 2019, 12:42 PM
  • Move condition.
gtbercea marked 2 inline comments as done.Aug 6 2019, 12:43 PM
gtbercea added inline comments.
libomptarget/src/device.cpp
174 ↗(On Diff #212885)

Moved it.

gtbercea updated this revision to Diff 213697.Aug 6 2019, 12:58 PM
gtbercea marked an inline comment as done.
  • Clean up conditions.
grokos added inline comments.Aug 6 2019, 1:41 PM
libomptarget/src/device.cpp
174 ↗(On Diff #212885)

OK, it looks good now. Make sure to rebase the next patch (the close modifier) and extend this condition to check for !close before returning the host address (RTLRequiresFlags & OMP_REQ_UNIFIED_SHARED_MEMORY && !HasCloseModifier).

gtbercea marked 2 inline comments as done.Aug 6 2019, 2:10 PM
gtbercea added inline comments.
libomptarget/src/device.cpp
174 ↗(On Diff #212885)

Already done :)

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptAug 7 2019, 10:29 AM
Hahnfeld added inline comments.Aug 9 2019, 3:43 AM
openmp/trunk/libomptarget/test/unified_shared_memory/shared_update.c
28–30

Apparently, this does not work: The generated code will call __tgt_register_lib first which caches the global RequiresFlags in Device.RTLRequiresFlags. Because __tgt_register_requires has not been called yet, the value is still 0 so the new code won't be executed. Please fix and test on your end that it works with older versions of the compiler!

gtbercea marked 2 inline comments as done.Aug 9 2019, 7:24 AM
gtbercea added inline comments.
openmp/trunk/libomptarget/test/unified_shared_memory/shared_update.c
28–30

@Hahnfeld this test works correctly for compiler versions which support unified shared memory.

My proposed fix is to remove all the manual calls and restrict the test to new versions of Clang and do that for every test here and in the close modifier patches. None of the close or unified memory pieces of functionality need to be tested with older clang versions because they are not supported on those versions.

gtbercea marked an inline comment as done.Aug 9 2019, 7:25 AM
Hahnfeld added inline comments.Aug 9 2019, 7:33 AM
openmp/trunk/libomptarget/test/unified_shared_memory/shared_update.c
28–30

This will lose a lot of test coverage for the runtime library because the tests can only be run with a not-even-released versions of the compiler.

gtbercea marked an inline comment as done.Aug 9 2019, 7:52 AM
gtbercea added inline comments.
openmp/trunk/libomptarget/test/unified_shared_memory/shared_update.c
28–30

New features are only added to the newest version of the compiler and it will work with that and I think that's where the tests need to pass to make sure that incoming functionality doesn't break existing one.

I don't see any reason to test with older versions of Clang that can't even use these features. Am I missing something? What's the gain?

I never get those unified shared memory tests work on NVIDIA GPUs. Is it expected?

Herald added a project: Restricted Project. · View Herald TranscriptTue, Dec 19, 7:15 PM