This is an archive of the discontinued LLVM Phabricator instance.

[Clang][OpenMP] Fix the issue that list items in `has_device_addr` are still mapped to the target device
ClosedPublic

Authored by tianshilei1992 on Jan 12 2023, 12:20 PM.

Details

Summary

This patch fixes the issue that list items in has_device_addr are still mapped
to the target device because front end emits map type OMP_MAP_TO.

Fix #59160.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 12:20 PM
tianshilei1992 requested review of this revision.Jan 12 2023, 12:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 12:20 PM
abhinavgaba added a comment.EditedJan 12 2023, 1:02 PM

Hi there, I'm trying to fix https://github.com/llvm/llvm-project/issues/59160. The faulty case is basically like the following:

void xoo() {
  short a[10], b[10];
  a[1] = 111;
  b[1] = 111;
#pragma omp target data map(to : a[0 : 2], b[0 : 2]) use_device_addr(a, b)
#pragma omp target has_device_addr(a) has_device_addr(b[0])
  {
    a[1] = 222;
    b[1] = 222;
    // CHECK: 222 222
    printf("%hd %hd %p %p %p\n", a[1], b[1], &a, b, &b);
  }
  // CHECK:111
  printf("%hd %hd %p %p %p\n", a[1], b[1], &a, b, &b); // 111 111 p1d p2d p3d
}

It looks like at runtime, we are trying to copy a (device) pointer to a device pointer by using host to device data transfer. I noticed that's because we have TO flag marked for the argument. However, since a and b are in has_device_addr, we are not supposed to map the two variables right?

The first thing is that in the test itself, has_device_addr(b[0]) is incorrect. b inside the "target data" region refers to the device version of b, not the host version. So, it is illegal to do b[0] (without unified shared memory) because we cannot load from device b on the host (target data is executed on host).

After that is changed to has_device_addr(b), the test will likely pass.

However, it is still true that has_device_addr(a, b) are being treated the same as map(to:a, b) for arrays, instead of passing the addresses &a, &b directly into the kernel as LITERALs. I think @jyu2 was working on changing that, so she might be able to say what changes are needed in clang for that.

tianshilei1992 added a comment.EditedJan 12 2023, 1:21 PM

I agree that b is not right here, but that doesn’t matter because I stepped into the runtime library and it crashed when processing a.

And why are they treated as to?

I agree that b is not right here, but that doesn’t matter because I stepped into the runtime library and it crashed when processing a.

And why are they treated as to?

Treating has_device_addr(a) as map(a) is incorrect. I think it is just a vestige of the prior implementation, where has_device_addr was fully ignored and instead (map(tofrom)) kicked in for a (which is the implicit map for arrays).

The test likely passed on x86_64 plugin because it just re-mapped the output of use_device_addr(a), which is a device address, again, but on architectures without unified memory, this re-mapping won't work, hence the failure you see with Cuda.

#include <stdio.h>
int main() {
    short a[10], b[10];
    a[1] = 111;
    b[1] = 111;
    printf("%hd %hd %p %p\n", a[1], b[1], &a, &b); // 111 111 p1h p2h
  #pragma omp target data map(to : a[0 : 2], b[0 : 2]) use_device_addr(a, b)
    {
    printf("%p %p\n", &a, &b); // p1d p2d
  #pragma omp target has_device_addr(a) has_device_addr(b)
    {
      a[1] = 222;
      b[1] = 222;
      printf("%hd %hd %p %p\n", a[1], b[1], &a, &b); // 222 222 p1d p2d
    }
    }
    // CHECK:111
    printf("%hd %hd %p %p\n", a[1], b[1], &a, &b); // 111 111 p1h p2h
  }
$ clang -O0 -fopenmp -fopenmp-targets=x86_64 hda_test.c -fopenmp-version=51 && ./a.out
111 111 0x7fff2a47ecb0 0x7fff2a47ec90 // 111 111 p1h p2h
0x55f3cf685b10 0x55f3cf685c10 // p1d p2d: device versions of p1h, p2h
222 222 0x55f3cf685d70 0x55f3cf685e70 // p1dd p2dd:  another different device version of the two, because of tthe remapping. These should have been "p1d p2d".
111 111 0x7fff2a47ecb0 0x7fff2a47ec90 // 111 111 p1h p2h

FWIW, I think has_device_addr(b[0]) is not trying to take the value of b[0] in this case. Instead, it's just to take the address of the first element of b. Only pointer arithmetic will be involved. It's not necessarily illegal to do it in that way.

FWIW, I think has_device_addr(b[0]) is not trying to take the value of b[0] in this case. Instead, it's just to take the address of the first element of b. Only pointer arithmetic will be involved. It's not necessarily illegal to do it in that way.

That is true. It would be wrong if b was a pointer, for which &b[0] computation would have involved a load.

jyu2 added a comment.Jan 13 2023, 8:43 AM

That part of code is original add for is_device_address, so I just wonder, if the change could break is_device_address?

That part of code is original add for is_device_address, so I just wonder, if the change could break is_device_address?

Now I kinda think it is not right to mix is_device_address and has_device_addr.

Basically, is_device_address means the list items are device address, so the address should be taken as literal, directly passed to the kernel.

On the other hand, has_device_addr indicates that the list items *should* have device address, which means there has to be an entry for that. Note that it is different from the OpenMP map clause. OpenMP's map clause (w/o always of course) means if the list items are not mapped, do it, and transfer the data accordingly; otherwise, use the one in the map table. I think has_device_addr only means map table lookup. Use it if found, otherwise undefined behavior (per spec). We are not supposed to update mapping table.

So back this patch, or clang front end, I think the correct way to handle this is to create a new flag, indicating the mapping is supposed to exist. The runtime needs to be changed accordingly in a way that if the flag is set, it should error out if it doesn't find any mapping.

Meanwhile, I think the test case for has_device_addr is not correct.

void xoo() {
  short a[10], b[10];
  a[1] = 111;
  b[1] = 111;
#pragma omp target data map(to : a[0 : 2], b[0 : 2]) use_device_addr(a, b)
#pragma omp target has_device_addr(a) has_device_addr(b[0])
  {
    a[1] = 222;
    b[1] = 222;
    // CHECK: 222 222
    printf("%hd %hd %p %p %p\n", a[1], b[1], &a, b, &b);
  }
  // CHECK:111
  printf("%hd %hd %p %p %p\n", a[1], b[1], &a, b, &b); // 111 111 p1d p2d p3d
}

In target data we already put a and b in use_device_addr. That indicates all use of a and b will be the corresponding device addresses. Therefore, in target directive, we should use is_device_address instead of has_device_addr. The correct way to use has_device_addr is, we already map the list items by using target data w/o use_device_addr. Then when we launch a kernel using target directive with has_device_addr, we tell the target region, the list items *should* be there, and use them, otherwise it is an error (we choose to error out for the undefined behavior).

@jyu2 WDYT?

In target data we already put a and b in use_device_addr. That indicates all use of a and b will be the corresponding device addresses. Therefore, in target directive, we should use is_device_address instead of has_device_addr. The correct way to use has_device_addr is, we already map the list items by using target data w/o use_device_addr. Then when we launch a kernel using target directive with has_device_addr, we tell the target region, the list items *should* be there, and use them, otherwise it is an error (we choose to error out for the undefined behavior).

I think you are talking about is_device_ptr clause. There is no is_device_address clause in OpenMP.

The is_device_ptr clause is meant only for "ptrs" (pointers). For example:

int *p = omp_target_alloc(...);
#pragma omp target is_device_ptr(p)

On the other hand, has_device_addr indicates that the list items *should* have device address, which means there has to be an entry for that.

Based on a brief discussion with some members of the OpenMP spec committee, the idea for "has_device_addr" is to have the address passed-in directly (as a literal, similar to is_device_ptr) into the target region, without any map lookup. So, there is no requirement that the variable has to be mapped, or tracked by libomptarget. That requirement is for map(present:x).

One example use-case from @dreachem is this:

#pragma omp requires unified_shared_memory

int x ;

printg("%p\n", &x); // p1h
printf("%d\n", omp_target_is_present(&x, omp_get_default_device())); // 0 (x is not "present", as per the OpenMP runtime)

#pragma omp target has_device_addr(x)
print("%p\n", &x); // p1h (same address as on the host side)
}

In this case, because of unified shared memory, x is accessible on device as well, even though it is not mapped, or made declare target etc. So we need to pass the address of x into the region, even though omp_target_is_present would return false for it.

In terms of the code emitted, the original idea of passing the address in as a LITERAL, similar to is_device_ptr is the right way to think about it.

%x = alloca i32 ; Original allocation for x

Map:
<ptr %x, ptr %x, sizeof(ptr), PARAM|LITERAL> 

Outlined function:
define void @outlined...(ptr %x) {
   ...
   call i32 @printf(..., ptr %x)
   ...
}

In target data we already put a and b in use_device_addr. That indicates all use of a and b will be the corresponding device addresses. Therefore, in target directive, we should use is_device_address instead of has_device_addr. The correct way to use has_device_addr is, we already map the list items by using target data w/o use_device_addr. Then when we launch a kernel using target directive with has_device_addr, we tell the target region, the list items *should* be there, and use them, otherwise it is an error (we choose to error out for the undefined behavior).

I think you are talking about is_device_ptr clause. There is no is_device_address clause in OpenMP.

Oh that's correct. I directly copied from Jennifer's comment. ;-)

The is_device_ptr clause is meant only for "ptrs" (pointers). For example:

int *p = omp_target_alloc(...);
#pragma omp target is_device_ptr(p)

That's true, but what about the case I mentioned? It is also supposed to use is_device_ptr.

On the other hand, has_device_addr indicates that the list items *should* have device address, which means there has to be an entry for that.

Based on a brief discussion with some members of the OpenMP spec committee, the idea for "has_device_addr" is to have the address passed-in directly (as a literal, similar to is_device_ptr) into the target region, without any map lookup. So, there is no requirement that the variable has to be mapped, or tracked by libomptarget. That requirement is for map(present:x).
In terms of the code emitted, the original idea of passing the address in as a LITERAL, similar to is_device_ptr is the right way to think about it.

No. I think you are mixing things up. The spec says:

The has_device_addr clause indicates that its list items already have device addresses and therefore they may be directly accessed from a target device.

It only indicates the list items already have device addresses. I don't think it has another level of meaning that, the list of variables listed are device addresses. The second part above is, "they may be directly accessed from a target device". My reading is, they may be directly accessed from a target device "without a mapping", which exactly the map(present:x) you suggested indicates. And yes, we don't need extra flag for that. present is exactly we need here.

tianshilei1992 planned changes to this revision.Jan 20 2023, 11:56 AM
tianshilei1992 added a comment.EditedApr 22 2023, 10:39 AM

After reading some threads (https://github.com/OpenMP/spec/issues/2178#issue-622053885 and https://github.com/OpenMP/spec/issues/1870), I think has_device_addr is not supposed to have any semantics about creating a mapping, especially in https://github.com/OpenMP/spec/issues/2178#issue-622053885 it is used as a equivalent of firstprivate.
https://github.com/OpenMP/spec/issues/2178#issuecomment-631751755 mentions is_device_ptr has to convert the device pointer to device address, while has_device_addr doesn't, and since in LLVM OpenMP implementation device pointer is treated as device address, passing them as literal makes sense, so the fix is valid.

jyu2 added a comment.Apr 27 2023, 8:31 AM

If it is okay with is_divece_ptr. I am okay with it.
Please add a lit test for this?

Thanks.
Jennifer

jyu2 accepted this revision.May 25 2023, 8:14 AM

LGTM

This revision is now accepted and ready to land.May 25 2023, 8:14 AM
This revision was landed with ongoing or failed builds.May 25 2023, 5:19 PM
This revision was automatically updated to reflect the committed changes.
dyung added a subscriber: dyung.May 25 2023, 6:12 PM

In case you are not already aware, the test is still failing after your fix:
https://lab.llvm.org/buildbot/#/builders/139/builds/41495

In case you are not already aware, the test is still failing after your fix:
https://lab.llvm.org/buildbot/#/builders/139/builds/41495

thanks. I’ll fix it soon or I’ll revert it.