This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][libomptarget] Fix alignment calculation for mapping struct members.
AbandonedPublic

Authored by pavelkopyl on Jan 24 2023, 2:30 PM.

Details

Summary

This patch fixes calculation of 'HstPtrBegin' address to prevent
'explicit extension not allowed: ...' error that may happen when the
structure being mapped has less that 8-byte alignment:

#include <omp.h>
int main() {
  struct S {
    int i;
    int j;
  } s;
  s.i = 20;
  s.j = 30;
  #pragma omp target data map(tofrom : s)
  {
    #pragma omp target map(from : s.i, s.j)
    {
      s.i = 21;
      s.j = 31;
    }
  }
  return 0;
}

In case 's' object is only 4-byte aligned (please, see D135462 for details),
we may get the following error:

...
Libomptarget --> Looking up mapping(HstPtrBegin=0x00007ffc9b2581e4, Size=8)...
Libomptarget --> Creating new map entry with HstPtrBase=0x00007ffc9b2581e4, HstPtrBegin=0x00007ffc9b2581e4...
...
Libomptarget --> Moving 8 bytes (hst:0x00007ffc9b2581e4) -> (tgt:0x0000561d0739b390)
...
Libomptarget --> Entry  0: Base=0x00007ffc9b2581e4, Begin=0x00007ffc9b2581e4, Size=8, Type=0x20, Name=unknown
Libomptarget --> Entry  1: Base=0x00007ffc9b2581e4, Begin=0x00007ffc9b2581e4, Size=4, Type=0x1000000000002, Name=s.i
Libomptarget --> Entry  2: Base=0x00007ffc9b2581e4, Begin=0x00007ffc9b2581e8, Size=4, Type=0x1000000000002, Name=s.j
...
Libomptarget --> Using a padding of 4 bytes for begin address 0x00007ffc9b2581e4
Libomptarget --> Looking up mapping(HstPtrBegin=0x00007ffc9b2581e0, Size=12)...
Libomptarget --> WARNING: Pointer is not mapped but section extends into already mapped data
Libomptarget message: explicit extension not allowed: host address specified is 0x00007ffc9b2581e0 (12 bytes),
 but device allocation maps to host at 0x00007ffc9b2581e4 (8 bytes)

Here we have wrong HstPtrBegin address - 0x00007ffc9b2581e0 that points
to a memory before 's' object itself, which starts at HstPtrBase (0x00007ffc9b2581e4).
This, in turn, leads to inconsistency in mapping sizes created and requested.

Diff Detail

Event Timeline

pavelkopyl created this revision.Jan 24 2023, 2:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 24 2023, 2:30 PM
pavelkopyl requested review of this revision.Jan 24 2023, 2:30 PM
Herald added a project: Restricted Project. · View Herald Transcript
pavelkopyl edited the summary of this revision. (Show Details)

Why not add the test?

pavelkopyl added a comment.EditedJan 24 2023, 2:47 PM

Hi All,

To be honest I'm new in openmp runtime implementation. I just wanted to fix an issue originally revealed by https://reviews.llvm.org/D135462, as it blocks landing that patch. I'm not 100% sure this is right approach, probably we need to fix front-end instead. Anyway I'd like to discuss it here.

Is that link correct? It points to a very old and seemingly irrelevant patch.

Is that link correct? It points to a very old and seemingly irrelevant patch.

Sorry, this is correct one: https://reviews.llvm.org/D135462

Why not add the test?

Sure, I'm going to add a test for this. I guess it will be written in LLVM IR, as I need to specify non-default alignment for a structure to be mapped.
Will it be OK? I just see that all the tests from /openmp/libomptarget/test/ are written in C.

Padding was added to libomptarget in order to ensure 8-byte struct members remain properly aligned, e.g. pointers, doubles etc. If you remove padding, then such a member may find itself in a non 8-aligned address, making its access result in a segfault.

For instance, in the code you provided, add a third struct member k of type double

struct S {
  int i;
  int j;
  double k;
}

and partially map the struct map(s.j, s.k). There are architectures (e.g. CUDA for sure) where memalloc will always return an 8-aligned (or even higher) address. If we don't add a 4-byte padding at the beginning of the struct on the device, j will start at an address of the form 0x...0, then k will find itself at an address of the form 0x...4, i.e. k will not be 8-aligned. Try accessing it on virtually any architecture and you'll get a segfault. By adding 4 dummy bytes at the beginning of the struct, j finds itself at 0x...4 and k at 0x...8, which is what we need to ensure.

I'm not sure I understand the logic between the subtraction here. Instead, I would suggest we adjust the effective alignment used as follows:

int64_t EffectiveAlignment = (int64_t)HstPtrBase % Alignment;

Which should give us the minimum of the two alignments (8 byte and base object alignment), right?
If so, that is what we should be using for the object. So, replace Alignment use with EffectiveAlignment which should shrink Padding in your use case to 0.

For instance, in the code you provided, add a third struct member k of type double

The struct will have an alignment of 8 then, it will adjust according to members. We should simply not assume 8 but use the actual base pointer alignment.

The struct will have an alignment of 8 then, it will adjust according to members.

Sure, but that's true for the entire struct on the host. If you map it partially on the device, you may break this alignment, i.e. the partially mapped struct may no longer satisfy alignment requirements.

Just to summarize, the current implementation of padding in libomptarget has a bug, but it was hidden until we changed default alignment of allocas in D135462. This bug is now gating D135462 and Pavel and I were trying to investigate it. @jdoerfert, @grokos, your feedback is very welcome, since we're not really familiar with the code in OpenMP runtime.

jdoerfert added a comment.EditedJan 24 2023, 5:43 PM

Just to summarize, the current implementation of padding in libomptarget has a bug, but it was hidden until we changed default alignment of allocas in D135462. This bug is now gating D135462 and Pavel and I were trying to investigate it.

Yes, and unfortunately yes.

The struct will have an alignment of 8 then, it will adjust according to members.

Sure, but that's true for the entire struct on the host. If you map it partially on the device, you may break this alignment, i.e. the partially mapped struct may no longer satisfy alignment requirements.

The base has always the alignment we are looking for. And "base" here is the entire struct (aka HstPtrBase).
Right now we assume the base is at least 8 aligned, but in the case of two integers it's not. Hence the error.
We cannot align any member with more alignment than the base, which is what is happening here.
However, clang will align the base according to the members, so if there is an 8byte pointer (or double) in there, base will be 8 aligned.
If there is a long double in there, base will be 16 aligned, etc.

All that said, if you think my reasoning is flawed, could you provide an example?
My development server is down so I can't write the patch I have in mind myself rn.

Padding was added to libomptarget in order to ensure 8-byte struct members remain properly aligned, e.g. pointers, doubles etc. If you remove padding, then such a member may find itself in a non 8-aligned address, making its access result in a segfault.

For instance, in the code you provided, add a third struct member k of type double

struct S {
  int i;
  int j;
  double k;
}

and partially map the struct map(s.j, s.k). There are architectures (e.g. CUDA for sure) where memalloc will always return an 8-aligned (or even higher) address. If we don't add a 4-byte padding at the beginning of the struct on the device, j will start at an address of the form 0x...0, then k will find itself at an address of the form 0x...4, i.e. k will not be 8-aligned. Try accessing it on virtually any architecture and you'll get a segfault. By adding 4 dummy bytes at the beginning of the struct, j finds itself at 0x...4 and k at 0x...8, which is what we need to ensure.

Thank you for clarification. I understand that we need to care about alignment of partially mapped structures on a device side. As I see, the only small issue here is in calculation of Padding value and updating mapping begin address:

HstPtrBegin = (char *)HstPtrBegin - Padding;

In case of a 4-aligned structure that has two integers, we may get, which is wrong I guess. This patch just limits Padding value. If padded begin address is less that HstPtrBase, we just need to map starting from beginning of the structure. Does this make sense?

  • Limit Padding values

Your approach fails the second scenario in the D142586 test case. The problem is you look at the first member mapped but that is not necessarily the first member mapped later.

Your approach fails the second scenario in the D142586 test case. The problem is you look at the first member mapped but that is not necessarily the first member mapped later.

OK, that you for clarification.