This patch removes the translation code since this functionality is now implemented in the compiler. target_data_begin and target_data_end are also patched to handle some special cases that used to be handled by the obsolete translation function, namely ensure proper alignment of struct members when we have partially mapped structs. Mapping a struct from a higher address (i.e. not from its beginning) can result in distortion of the alignment for some of its member fields. Padding restores the original (proper) alignment.
Details
- Reviewers
Hahnfeld jlpeyton jhen - Commits
- rGa0da24683b2b: [OpenMP][libomptarget] New map interface: remove translation code and ensure…
rOMP337455: [OpenMP][libomptarget] New map interface: remove translation code and ensure…
rL337455: [OpenMP][libomptarget] New map interface: remove translation code and ensure…
Diff Detail
- Repository
- rL LLVM
Event Timeline
These are two changes and need to be two patches.
libomptarget/src/omptarget.cpp | ||
---|---|---|
197–199 ↗ | (On Diff #137318) | I thought this is now done in the compiler? |
libomptarget/src/omptarget.cpp | ||
---|---|---|
197–199 ↗ | (On Diff #137318) | No, it's not. The compiler could take care of this issue, but it's not its job. The compiler should just inform the runtime that we requested a mapping starting from some address. If CUDA memcpy, for instance, has some requirements regarding the alignment of addresses, that's not the compiler's business. The compiler doesn't and shouldn't care about what happens at the plugin level of libomptarget. |
libomptarget/src/omptarget.cpp | ||
---|---|---|
197–199 ↗ | (On Diff #137318) | If that's specific to CUDA, why does it happen in the plugin agnostic part of libomptarget? |
libomptarget/src/omptarget.cpp | ||
---|---|---|
197–199 ↗ | (On Diff #137318) | That's a good point. A more elegant solution would be to extend the plugin interface (__tgt_rtl_* functions) with a new function which the agnostic library can query in order to get any alignment requirements. I'm in favour of this approach, but I need to ask other people what they think. In any case, implementing this potential change is not part of this patch. Thoughts? |
libomptarget/src/omptarget.cpp | ||
---|---|---|
197–199 ↗ | (On Diff #137318) | I am not sure what you are trying to do here. For structure members, the compiler should generate the begin address, offset and size. The code generated for the target is using the beginning of the struct to access the field. So cannot just pad the field member. |
libomptarget/src/omptarget.cpp | ||
---|---|---|
197–199 ↗ | (On Diff #137318) | Good point, this needs clarification. Maybe @grokos could share a code example where this padding is needed and point to documentation where it says that cuMemcpy can only handle aligned pointers? My guess which might be completely wrong: Maybe the begin address is just for transfer and the target code will use the subsequent entries which point to the member directly? |
libomptarget/src/omptarget.cpp | ||
---|---|---|
197–199 ↗ | (On Diff #137318) | @RaviNarayanaswamy : What you refer to is the Base address, which is the starting address of the struct. And you are right, the target code uses this address to access members of the struct and we cannot modify it. What I am padding is the Begin address, which is the address of the first mapped member. I was mistaken by the debug output, padding is not needed for memory transfers, but for the kernel execution itself. It ensures that the alignment of each mapped field remains what it should be. E.g. struct S { int a; // 4-aligned int b; // 4-aligned int *p; // 8-aligned } s1; ... #pragma omp target map(tofrom: s1.b, s1.p[0:N]) { s1.b = 5; for (int i...) s1.p[i] = ...; } In this example we are mapping s1 starting from member b. So, BaseAddress=&s1=&s1.a and BeginAddress=&s1.b. Let's assume that the struct begins at address 0x100. Then &s1.a=0x100, &s1.b=0x104, &s1.p=0x108. Each member obeys the alignment requirements for its type. Now, when we allocate memory on the device, in CUDA's case cuMemAlloc() returns an address which is at least 256-aligned. This means that the chunk of the struct on the device will start at a 256-aligned address, let's say 0x200. Then the address of b will be 0x200 and address of p will be a misaligned 0x204 (on the host there was no need to add padding between b and p, so p comes exactly 4 bytes after b). If the device kernel tries to access s1.p, a misaligned address error occurs (as reported by the CUDA plugin). By padding the begin address down to a multiple of 8 and extending the size of the allocated chuck accordingly, the chuck on the device will start at 0x200 with the padding (4 bytes), then &s1.b=0x204 and &s1.p=0x208, as they should be to satisfy the alignment requirements. |
No, the clang patch needs to be reimplemented, the current version is not going to be accepted. I'm looking into it.
The libomptarget-side of things does not need to change, however, so this patch will remain as-is, waiting to be upstreamed once the clang patch is ready (unless there are comments of course).
Hi George, when do you plan to commit this patch? I cleaned the patch for the clang and it is ready to be committed.
Let's coordinate in order to commit both patches at the same time. The patch is currently blocked by Jonas.
@Hahnfeld: Is there any other issue that needs to be addressed? If not, is this good to go?
I think we were mainly waiting for the Clang patch, removing the translation code itself is highly desirable (and is blocking some outstanding fixes).
A second point was the alignment problem which I think I now understood. Please update the patch description accordingly so that others can get the rationale from looking at the commit log.
If the code you posted inline triggers the problem (if not, I'd need a minimal example that would break), I'm happy for now and will test after the changes have landed.
One minor question inline, but I think we can proceed for now as long as we don't regress.
libomptarget/src/omptarget.cpp | ||
---|---|---|
202–204 ↗ | (On Diff #137396) | Again looking at this code, what if the member_of comes later and is not directly adjacent? |
libomptarget/src/omptarget.cpp | ||
---|---|---|
202–204 ↗ | (On Diff #137396) | This is a concern, but at least for now clang produces the combined entry followed by the individual fields. I think this behavior should be enforced in the compiler. Otherwise, we would have to scan all other entries trying to find a member_of. This would be a process of quadratic complexity, which is what we wanted to eliminate via the new map interface. |
@ABataev: Can you put a link to the clang-side patch in the description so that we link the two patches together? Also, please let me know when you commit the clang patch so that I commit this one as well.
The patch will be committed as is, without the review. I planned to commit it as soon as you commit the libomptarget patch.