This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][libomptarget] New map interface: remove translation code and ensure proper alignment of struct members
ClosedPublic

Authored by grokos on Mar 6 2018, 7:26 PM.

Details

Summary

This patch removes the translation code since this functionality is now implemented in the compiler. target_data_begin and target_data_end are also patched to handle some special cases that used to be handled by the obsolete translation function, namely ensure proper alignment of struct members when we have partially mapped structs. Mapping a struct from a higher address (i.e. not from its beginning) can result in distortion of the alignment for some of its member fields. Padding restores the original (proper) alignment.

Diff Detail

Repository
rL LLVM

Event Timeline

grokos created this revision.Mar 6 2018, 7:26 PM
grokos retitled this revision from [Clang][OpenMP] New clang/libomptarget map interface: remove translation code to [OpenMP] New clang/libomptarget map interface: remove translation code.Mar 6 2018, 7:26 PM
Hahnfeld requested changes to this revision.Mar 7 2018, 1:45 AM

These are two changes and need to be two patches.

libomptarget/src/omptarget.cpp
197–199 ↗(On Diff #137318)

I thought this is now done in the compiler?

This revision now requires changes to proceed.Mar 7 2018, 1:45 AM
grokos updated this revision to Diff 137396.Mar 7 2018, 7:56 AM

OK, I left the change to the internal device ID representation out of this patch.

grokos added inline comments.Mar 7 2018, 8:23 AM
libomptarget/src/omptarget.cpp
197–199 ↗(On Diff #137318)

No, it's not. The compiler could take care of this issue, but it's not its job. The compiler should just inform the runtime that we requested a mapping starting from some address. If CUDA memcpy, for instance, has some requirements regarding the alignment of addresses, that's not the compiler's business. The compiler doesn't and shouldn't care about what happens at the plugin level of libomptarget.

grokos edited the summary of this revision. (Show Details)Mar 7 2018, 8:24 AM
Hahnfeld added inline comments.Mar 7 2018, 9:02 AM
libomptarget/src/omptarget.cpp
197–199 ↗(On Diff #137318)

If that's specific to CUDA, why does it happen in the plugin agnostic part of libomptarget?

grokos added inline comments.Mar 7 2018, 9:16 AM
libomptarget/src/omptarget.cpp
197–199 ↗(On Diff #137318)

That's a good point. A more elegant solution would be to extend the plugin interface (__tgt_rtl_* functions) with a new function which the agnostic library can query in order to get any alignment requirements. I'm in favour of this approach, but I need to ask other people what they think. In any case, implementing this potential change is not part of this patch. Thoughts?

RaviNarayanaswamy added inline comments.
libomptarget/src/omptarget.cpp
197–199 ↗(On Diff #137318)

I am not sure what you are trying to do here. For structure members, the compiler should generate the begin address, offset and size. The code generated for the target is using the beginning of the struct to access the field. So cannot just pad the field member.

Hahnfeld added inline comments.Mar 9 2018, 1:27 AM
libomptarget/src/omptarget.cpp
197–199 ↗(On Diff #137318)

Good point, this needs clarification. Maybe @grokos could share a code example where this padding is needed and point to documentation where it says that cuMemcpy can only handle aligned pointers?

My guess which might be completely wrong: Maybe the begin address is just for transfer and the target code will use the subsequent entries which point to the member directly?

grokos added inline comments.Mar 9 2018, 2:48 PM
libomptarget/src/omptarget.cpp
197–199 ↗(On Diff #137318)

@RaviNarayanaswamy : What you refer to is the Base address, which is the starting address of the struct. And you are right, the target code uses this address to access members of the struct and we cannot modify it. What I am padding is the Begin address, which is the address of the first mapped member.

I was mistaken by the debug output, padding is not needed for memory transfers, but for the kernel execution itself. It ensures that the alignment of each mapped field remains what it should be. E.g.

struct S {
  int a;   // 4-aligned
  int b;   // 4-aligned
  int *p;  // 8-aligned
} s1;
...
#pragma omp target map(tofrom: s1.b, s1.p[0:N])
{
  s1.b = 5;
  for (int i...) s1.p[i] = ...;
}

In this example we are mapping s1 starting from member b. So, BaseAddress=&s1=&s1.a and BeginAddress=&s1.b. Let's assume that the struct begins at address 0x100. Then &s1.a=0x100, &s1.b=0x104, &s1.p=0x108. Each member obeys the alignment requirements for its type.

Now, when we allocate memory on the device, in CUDA's case cuMemAlloc() returns an address which is at least 256-aligned. This means that the chunk of the struct on the device will start at a 256-aligned address, let's say 0x200. Then the address of b will be 0x200 and address of p will be a misaligned 0x204 (on the host there was no need to add padding between b and p, so p comes exactly 4 bytes after b). If the device kernel tries to access s1.p, a misaligned address error occurs (as reported by the CUDA plugin).

By padding the begin address down to a multiple of 8 and extending the size of the allocated chuck accordingly, the chuck on the device will start at 0x200 with the padding (4 bytes), then &s1.b=0x204 and &s1.p=0x208, as they should be to satisfy the alignment requirements.

Was the new map interface added to Clang and just missed it?

Was the new map interface added to Clang and just missed it?

No, the clang patch needs to be reimplemented, the current version is not going to be accepted. I'm looking into it.

The libomptarget-side of things does not need to change, however, so this patch will remain as-is, waiting to be upstreamed once the clang patch is ready (unless there are comments of course).

Hi George, when do you plan to commit this patch? I cleaned the patch for the clang and it is ready to be committed.

Hi George, when do you plan to commit this patch? I cleaned the patch for the clang and it is ready to be committed.

Let's coordinate in order to commit both patches at the same time. The patch is currently blocked by Jonas.

@Hahnfeld: Is there any other issue that needs to be addressed? If not, is this good to go?

Hahnfeld accepted this revision.Jul 18 2018, 11:34 PM

Let's coordinate in order to commit both patches at the same time. The patch is currently blocked by Jonas.

@Hahnfeld: Is there any other issue that needs to be addressed? If not, is this good to go?

I think we were mainly waiting for the Clang patch, removing the translation code itself is highly desirable (and is blocking some outstanding fixes).

A second point was the alignment problem which I think I now understood. Please update the patch description accordingly so that others can get the rationale from looking at the commit log.
If the code you posted inline triggers the problem (if not, I'd need a minimal example that would break), I'm happy for now and will test after the changes have landed.

One minor question inline, but I think we can proceed for now as long as we don't regress.

libomptarget/src/omptarget.cpp
202–204 ↗(On Diff #137396)

Again looking at this code, what if the member_of comes later and is not directly adjacent?

This revision is now accepted and ready to land.Jul 18 2018, 11:34 PM
grokos retitled this revision from [OpenMP] New clang/libomptarget map interface: remove translation code to [OpenMP][libomptarget] New map interface: remove translation code and ensure proper alignment of struct members.Jul 19 2018, 5:48 AM
grokos edited the summary of this revision. (Show Details)
grokos updated this revision to Diff 156254.Jul 19 2018, 6:24 AM

Added example code to demonstrate the need for padding in partially mapped structs.

grokos added inline comments.Jul 19 2018, 6:31 AM
libomptarget/src/omptarget.cpp
202–204 ↗(On Diff #137396)

This is a concern, but at least for now clang produces the combined entry followed by the individual fields. I think this behavior should be enforced in the compiler. Otherwise, we would have to scan all other entries trying to find a member_of. This would be a process of quadratic complexity, which is what we wanted to eliminate via the new map interface.

@ABataev: Can you put a link to the clang-side patch in the description so that we link the two patches together? Also, please let me know when you commit the clang patch so that I commit this one as well.

@ABataev: Can you put a link to the clang-side patch in the description so that we link the two patches together? Also, please let me know when you commit the clang patch so that I commit this one as well.

The patch will be committed as is, without the review. I planned to commit it as soon as you commit the libomptarget patch.

This revision was automatically updated to reflect the committed changes.

@ABataev did you commit the Clang patch and I just missed it?

Yes, it was committed

Best regards,
Alexey Bataev

23 июля 2018 г., в 5:18, Jonas Hahnfeld via Phabricator <reviews@reviews.llvm.org> написал(а):

Hahnfeld added a comment.

@ABataev did you commit the Clang patch and I just missed it?

Repository:
rL LLVM

https://reviews.llvm.org/D44186