Page MenuHomePhabricator

[OpenMP][libomptarget] Add support for unified memory for regular maps
ClosedPublic

Authored by gtbercea on Fri, Jul 19, 9:14 AM.

Details

Summary

This patch adds support for using unified memory in the case of regular maps that happen when a target region is offloaded to the device.

For cases where only a single version of the data is required then the host address can be used. When variables need to be privatized in any way or globalized, then the copy to the device is still required for correctness.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
  • Address comments.
libomptarget/src/omptarget.cpp
247–249 ↗(On Diff #211963)

So I think I had misunderstood the spec the first time around, I have now removed the special treatment of the use_device_ptr. The host address should be preserved now.

gtbercea marked 3 inline comments as done.Tue, Jul 30, 1:33 PM
gtbercea added inline comments.
libomptarget/src/device.cpp
178 ↗(On Diff #211963)

I tried and it leads to failing tests when handling structs.

gtbercea updated this revision to Diff 212436.Tue, Jul 30, 2:06 PM
gtbercea marked an inline comment as done.
  • Move tests to new folder.
  • Fix tests.
  • Move requires test.

@Hahnfeld @grokos new version of the patch with addressed comments.

@Hahnfeld I cannot replace the usage of the requires pragma with the call to the registration function directly because the registration function is the first function that gets called so if I explicitely invoke it will only be the 2nd call to that function. A subsequent call with a different set of flags will lead to a mismatch in requires clauses error. (first implicit call is without unified_shared_memory and the second call is with unified_shared_memory).

@Hahnfeld I cannot replace the usage of the requires pragma with the call to the registration function directly because the registration function is the first function that gets called so if I explicitely invoke it will only be the 2nd call to that function. A subsequent call with a different set of flags will lead to a mismatch in requires clauses error. (first implicit call is without unified_shared_memory and the second call is with unified_shared_memory).

Yes, but you can have both, the requires directive and the manual call. I understand that this is not needed for newer Clang versions, but right now the test will fail for older releases that don't support requires.


For the tests, I think we should keep requires.c in test/offloading/ because it's not specific to unified_shared_memory. Can you also rename requires_unified_shared_memory_local.c to shared_update.c (or something like that) to make the name more concise?

libomptarget/src/api.cpp
118–123 ↗(On Diff #210914)

I'm still not sure about this, what do others think?

libomptarget/src/device.cpp
207–213 ↗(On Diff #212436)

use_device_ptr is no more, so this change shouldn't be needed anymore?

178 ↗(On Diff #211963)

Again I agree with the (static) analysis by @grokos. If that leads to failing tests, we need to understand why. What was the idea behind the calculation of tp?

290–291 ↗(On Diff #210914)

Is deallocTgtPtr still called under unified_shared_memory?

174–176 ↗(On Diff #211515)

I agree with @grokos here.

libomptarget/src/omptarget.cpp
409–418 ↗(On Diff #212436)

We can just move the error check into the branch, or even better move the check for unified_shared_memory to the outer condition.

511–512 ↗(On Diff #212436)

Do we need the shadow pointers under unified_shared_memory and if TgtPtrBegin == HstPtrBegin? If not, we can just move the check to the outer condition.

544–545 ↗(On Diff #212436)

As above, but I think this needs to check TgtPtrBegin == HstPtrBegin.

I think we can just do the following around line 490:

} else if (Device.RTLRequiresFlags & OMP_REQ_UNIFIED_SHARED_MEMORY &&
                TgtPtrBegin == HstPtrBegin) {
  DP("hst data:" DPxMOD " unified and shared, becomes a noop\n", DPxPTR(HstPtrBegin));
  continue;
}
693–703 ↗(On Diff #212436)

Can move the check Device.RTLRequiresFlags & OMP_REQ_UNIFIED_SHARED_MEMORY && TgtPtrBegin == HstPtrBegin to line 690, in a similar else branch behind !Pointer_TgtPtrBegin.

libomptarget/test/unified_shared_memory/api.c
14 ↗(On Diff #212436)

Please format the test.

gtbercea marked an inline comment as done.Wed, Jul 31, 7:13 AM
gtbercea added inline comments.
libomptarget/src/api.cpp
118–123 ↗(On Diff #210914)

See Alex's answer below!

gtbercea marked an inline comment as done.Wed, Jul 31, 7:16 AM
gtbercea added inline comments.
libomptarget/src/device.cpp
178 ↗(On Diff #211963)

The answer is simple: this is how it was done before. Look at the existing pointer calculation I see no reason why it should have to be simplified. The simplification you propose has nothing to do with unified memory so if you think it needs to be done then you should open it up as a separate issue in a separate patch.

Hahnfeld added inline comments.Wed, Jul 31, 9:56 AM
libomptarget/src/api.cpp
118–123 ↗(On Diff #210914)

This doesn't even mention omp_target_is_present, so this is no answer.

gtbercea updated this revision to Diff 212622.Wed, Jul 31, 10:32 AM
gtbercea marked 15 inline comments as done.
  • Rename test.
  • Revert file move. Simplify condition.
  • Address comments.
libomptarget/src/device.cpp
178 ↗(On Diff #211963)

I misread the comment. I think it's no problem doing the simplification.

gtbercea marked an inline comment as done.Wed, Jul 31, 12:40 PM
gtbercea added inline comments.
libomptarget/src/api.cpp
118–123 ↗(On Diff #210914)

So the problem is that if I return true all the time for the presence test when unified memory is on, then there's no way to distinguish the case where I use omp_target_alloc() but I haven't associated the host pointer with the device pointer yet.

So in other words the presence test is always an "honest" device presence test: it will only return true when there is a host version and corresponding device duplicate version. I think that case is the more meaningful to cover since for the other cases you don't really need a presence test.

gtbercea marked an inline comment as done.Wed, Jul 31, 1:09 PM
gtbercea added inline comments.
libomptarget/src/api.cpp
118–123 ↗(On Diff #210914)

I think your confusion is based on the fact that you assume that the presence test is for checking if a future usage of a variable in a device target region leads to a seg fault or not. And I agree that in the non-unified case this intent coincides with that of a map existing.
In the unified case that's not necessarily the case: you can use a variable on the device by referring to its host version without the need of a map existing. The presence test in that case will return false and that's fine.

@Hahnfeld I hope I have answered all your concerns. Please let me know if you have any other comments.

Hahnfeld requested changes to this revision.Thu, Aug 1, 9:14 AM

There's still no call to __tgt_register_requires in the two tests, so I guess they won't pass with older versions of Clang.

libomptarget/src/device.cpp
238 ↗(On Diff #212622)

Hehe, casting a void * to uintptr_t and back to void *? Please remove :D

libomptarget/src/omptarget.cpp
545–548 ↗(On Diff #212622)

This can never happen, the loop will continue above (currently line 493).

libomptarget/test/unified_shared_memory/shared_update.c
20 ↗(On Diff #212622)

This test isn't formatted either.

This revision now requires changes to proceed.Thu, Aug 1, 9:14 AM

There's still no call to __tgt_register_requires in the two tests, so I guess they won't pass with older versions of Clang.

ah I knew I was forgetting something :)

gtbercea updated this revision to Diff 212855.Thu, Aug 1, 10:59 AM
  • Add manual for registering the requires flag.
  • Address comments.
gtbercea marked 3 inline comments as done.Thu, Aug 1, 11:00 AM
gtbercea updated this revision to Diff 212885.Thu, Aug 1, 12:50 PM
  • Clean-up.
Hahnfeld accepted this revision.Thu, Aug 1, 11:59 PM

I'm fine with the last update.

This revision is now accepted and ready to land.Thu, Aug 1, 11:59 PM
Hahnfeld added a comment.EditedFri, Aug 2, 12:00 AM

Oh, but the tests again look odd. Please run them through clang-format.

grokos added inline comments.Fri, Aug 2, 10:42 AM
libomptarget/src/device.cpp
174 ↗(On Diff #212885)

I still don't understand this condition (even in the simplified form). Can you explain what we are testing?

gtbercea updated this revision to Diff 213098.Fri, Aug 2, 11:34 AM
  • Fix condition.
gtbercea marked an inline comment as done.Fri, Aug 2, 11:43 AM
gtbercea added inline comments.
libomptarget/src/device.cpp
174 ↗(On Diff #212885)

The condition comes from the 1st and 3rd if conditions. I have updated the condition to its more explicit version.

But, from what I can tell, IsContained can never be true in this case, ExtendesBefore/After I think cannot be true either.

IsImplicit can vary, Size also.

So I would reduce the condition to:

(IsImplicit || Size)

Does this now make sense?

gtbercea marked an inline comment as done.Fri, Aug 2, 11:58 AM
gtbercea added inline comments.
libomptarget/src/device.cpp
174 ↗(On Diff #212885)

Actually it should just be Size I think. Testing it now.

gtbercea updated this revision to Diff 213368.Mon, Aug 5, 7:41 AM
  • Fix condition again.
gtbercea marked 2 inline comments as done.Mon, Aug 5, 7:43 AM
gtbercea added inline comments.
libomptarget/src/device.cpp
174 ↗(On Diff #212885)

Condition has now been fixed. This is it in its simplest form.

gtbercea updated this revision to Diff 213369.Mon, Aug 5, 8:12 AM
  • Fix condition to exclude already mapped device data.
gtbercea updated this revision to Diff 213411.Mon, Aug 5, 11:10 AM
  • Format tests.

@grokos I have updated the condition. I have also amended the comment to better explain the conditions.

@grokos any thoughts on the updated condition?

grokos added inline comments.Tue, Aug 6, 11:56 AM
libomptarget/src/device.cpp
174 ↗(On Diff #212885)

I still think the condition is not correct - it can lead to invalid mappings escaping the runtime check. E.g. what if we use unified memory and we try to explicitly "map more"? The condition will evaluate to true (since we try to extend the mapping, isContained will be false and Size will be > 0), but the mapping is illegal and should be caught by the runtime.

I believe the whole if branch this patch introduces should be removed. Instead, we must let the original code perform the checks and in the last else-branch (line 185 in the original code "else if (Size)" we should check whether or not we use unified shared memory; if yes, then simply return the host address, if no, then proceed with the allocation of memory on the device. In other words, let this function find out whether the requested mapping already exists on the device; if it does we are done, if it doesn't then find out whether we should allocate device memory or use the host version of data.

gtbercea updated this revision to Diff 213691.Tue, Aug 6, 12:42 PM
  • Move condition.
gtbercea marked 2 inline comments as done.Tue, Aug 6, 12:43 PM
gtbercea added inline comments.
libomptarget/src/device.cpp
174 ↗(On Diff #212885)

Moved it.

gtbercea updated this revision to Diff 213697.Tue, Aug 6, 12:58 PM
gtbercea marked an inline comment as done.
  • Clean up conditions.
grokos added inline comments.Tue, Aug 6, 1:41 PM
libomptarget/src/device.cpp
174 ↗(On Diff #212885)

OK, it looks good now. Make sure to rebase the next patch (the close modifier) and extend this condition to check for !close before returning the host address (RTLRequiresFlags & OMP_REQ_UNIFIED_SHARED_MEMORY && !HasCloseModifier).

gtbercea marked 2 inline comments as done.Tue, Aug 6, 2:10 PM
gtbercea added inline comments.
libomptarget/src/device.cpp
174 ↗(On Diff #212885)

Already done :)

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptWed, Aug 7, 10:29 AM
Hahnfeld added inline comments.Fri, Aug 9, 3:43 AM
openmp/trunk/libomptarget/test/unified_shared_memory/shared_update.c
28–30

Apparently, this does not work: The generated code will call __tgt_register_lib first which caches the global RequiresFlags in Device.RTLRequiresFlags. Because __tgt_register_requires has not been called yet, the value is still 0 so the new code won't be executed. Please fix and test on your end that it works with older versions of the compiler!

gtbercea marked 2 inline comments as done.Fri, Aug 9, 7:24 AM
gtbercea added inline comments.
openmp/trunk/libomptarget/test/unified_shared_memory/shared_update.c
28–30

@Hahnfeld this test works correctly for compiler versions which support unified shared memory.

My proposed fix is to remove all the manual calls and restrict the test to new versions of Clang and do that for every test here and in the close modifier patches. None of the close or unified memory pieces of functionality need to be tested with older clang versions because they are not supported on those versions.

gtbercea marked an inline comment as done.Fri, Aug 9, 7:25 AM
Hahnfeld added inline comments.Fri, Aug 9, 7:33 AM
openmp/trunk/libomptarget/test/unified_shared_memory/shared_update.c
28–30

This will lose a lot of test coverage for the runtime library because the tests can only be run with a not-even-released versions of the compiler.

gtbercea marked an inline comment as done.Fri, Aug 9, 7:52 AM
gtbercea added inline comments.
openmp/trunk/libomptarget/test/unified_shared_memory/shared_update.c
28–30

New features are only added to the newest version of the compiler and it will work with that and I think that's where the tests need to pass to make sure that incoming functionality doesn't break existing one.

I don't see any reason to test with older versions of Clang that can't even use these features. Am I missing something? What's the gain?