This patch fixes the issue that list items in has_device_addr are still mapped
to the target device because front end emits map type OMP_MAP_TO.
Fix #59160.
Differential D141627
[Clang][OpenMP] Fix the issue that list items in `has_device_addr` are still mapped to the target device tianshilei1992 on Jan 12 2023, 12:20 PM. Authored by
Details This patch fixes the issue that list items in has_device_addr are still mapped Fix #59160.
Diff Detail
Event TimelineComment Actions
However, it is still true that has_device_addr(a, b) are being treated the same as map(to:a, b) for arrays, instead of passing the addresses &a, &b directly into the kernel as LITERALs. I think @jyu2 was working on changing that, so she might be able to say what changes are needed in clang for that. Comment Actions I agree that b is not right here, but that doesn’t matter because I stepped into the runtime library and it crashed when processing a. And why are they treated as to? Comment Actions Treating has_device_addr(a) as map(a) is incorrect. I think it is just a vestige of the prior implementation, where has_device_addr was fully ignored and instead (map(tofrom)) kicked in for a (which is the implicit map for arrays). The test likely passed on x86_64 plugin because it just re-mapped the output of use_device_addr(a), which is a device address, again, but on architectures without unified memory, this re-mapping won't work, hence the failure you see with Cuda. #include <stdio.h> int main() { short a[10], b[10]; a[1] = 111; b[1] = 111; printf("%hd %hd %p %p\n", a[1], b[1], &a, &b); // 111 111 p1h p2h #pragma omp target data map(to : a[0 : 2], b[0 : 2]) use_device_addr(a, b) { printf("%p %p\n", &a, &b); // p1d p2d #pragma omp target has_device_addr(a) has_device_addr(b) { a[1] = 222; b[1] = 222; printf("%hd %hd %p %p\n", a[1], b[1], &a, &b); // 222 222 p1d p2d } } // CHECK:111 printf("%hd %hd %p %p\n", a[1], b[1], &a, &b); // 111 111 p1h p2h } $ clang -O0 -fopenmp -fopenmp-targets=x86_64 hda_test.c -fopenmp-version=51 && ./a.out 111 111 0x7fff2a47ecb0 0x7fff2a47ec90 // 111 111 p1h p2h 0x55f3cf685b10 0x55f3cf685c10 // p1d p2d: device versions of p1h, p2h 222 222 0x55f3cf685d70 0x55f3cf685e70 // p1dd p2dd: another different device version of the two, because of tthe remapping. These should have been "p1d p2d". 111 111 0x7fff2a47ecb0 0x7fff2a47ec90 // 111 111 p1h p2h Comment Actions FWIW, I think has_device_addr(b[0]) is not trying to take the value of b[0] in this case. Instead, it's just to take the address of the first element of b. Only pointer arithmetic will be involved. It's not necessarily illegal to do it in that way. Comment Actions That is true. It would be wrong if b was a pointer, for which &b[0] computation would have involved a load. Comment Actions That part of code is original add for is_device_address, so I just wonder, if the change could break is_device_address? Comment Actions Now I kinda think it is not right to mix is_device_address and has_device_addr. Basically, is_device_address means the list items are device address, so the address should be taken as literal, directly passed to the kernel. On the other hand, has_device_addr indicates that the list items *should* have device address, which means there has to be an entry for that. Note that it is different from the OpenMP map clause. OpenMP's map clause (w/o always of course) means if the list items are not mapped, do it, and transfer the data accordingly; otherwise, use the one in the map table. I think has_device_addr only means map table lookup. Use it if found, otherwise undefined behavior (per spec). We are not supposed to update mapping table. So back this patch, or clang front end, I think the correct way to handle this is to create a new flag, indicating the mapping is supposed to exist. The runtime needs to be changed accordingly in a way that if the flag is set, it should error out if it doesn't find any mapping. Meanwhile, I think the test case for has_device_addr is not correct. void xoo() { short a[10], b[10]; a[1] = 111; b[1] = 111; #pragma omp target data map(to : a[0 : 2], b[0 : 2]) use_device_addr(a, b) #pragma omp target has_device_addr(a) has_device_addr(b[0]) { a[1] = 222; b[1] = 222; // CHECK: 222 222 printf("%hd %hd %p %p %p\n", a[1], b[1], &a, b, &b); } // CHECK:111 printf("%hd %hd %p %p %p\n", a[1], b[1], &a, b, &b); // 111 111 p1d p2d p3d } In target data we already put a and b in use_device_addr. That indicates all use of a and b will be the corresponding device addresses. Therefore, in target directive, we should use is_device_address instead of has_device_addr. The correct way to use has_device_addr is, we already map the list items by using target data w/o use_device_addr. Then when we launch a kernel using target directive with has_device_addr, we tell the target region, the list items *should* be there, and use them, otherwise it is an error (we choose to error out for the undefined behavior). @jyu2 WDYT? Comment Actions
I think you are talking about is_device_ptr clause. There is no is_device_address clause in OpenMP. The is_device_ptr clause is meant only for "ptrs" (pointers). For example: int *p = omp_target_alloc(...); #pragma omp target is_device_ptr(p)
Based on a brief discussion with some members of the OpenMP spec committee, the idea for "has_device_addr" is to have the address passed-in directly (as a literal, similar to is_device_ptr) into the target region, without any map lookup. So, there is no requirement that the variable has to be mapped, or tracked by libomptarget. That requirement is for map(present:x). One example use-case from @dreachem is this: #pragma omp requires unified_shared_memory int x ; printg("%p\n", &x); // p1h printf("%d\n", omp_target_is_present(&x, omp_get_default_device())); // 0 (x is not "present", as per the OpenMP runtime) #pragma omp target has_device_addr(x) print("%p\n", &x); // p1h (same address as on the host side) } In this case, because of unified shared memory, x is accessible on device as well, even though it is not mapped, or made declare target etc. So we need to pass the address of x into the region, even though omp_target_is_present would return false for it. In terms of the code emitted, the original idea of passing the address in as a LITERAL, similar to is_device_ptr is the right way to think about it. %x = alloca i32 ; Original allocation for x Map: <ptr %x, ptr %x, sizeof(ptr), PARAM|LITERAL> Outlined function: define void @outlined...(ptr %x) { ... call i32 @printf(..., ptr %x) ... } Comment Actions Oh that's correct. I directly copied from Jennifer's comment. ;-)
That's true, but what about the case I mentioned? It is also supposed to use is_device_ptr.
No. I think you are mixing things up. The spec says:
It only indicates the list items already have device addresses. I don't think it has another level of meaning that, the list of variables listed are device addresses. The second part above is, "they may be directly accessed from a target device". My reading is, they may be directly accessed from a target device "without a mapping", which exactly the map(present:x) you suggested indicates. And yes, we don't need extra flag for that. present is exactly we need here. Comment Actions After reading some threads (https://github.com/OpenMP/spec/issues/2178#issue-622053885 and https://github.com/OpenMP/spec/issues/1870), I think has_device_addr is not supposed to have any semantics about creating a mapping, especially in https://github.com/OpenMP/spec/issues/2178#issue-622053885 it is used as a equivalent of firstprivate. Comment Actions If it is okay with is_divece_ptr. I am okay with it. Thanks. Comment Actions In case you are not already aware, the test is still failing after your fix: |