This patch fixes a memory error that occurs when we access an aligned array on the device:
void write_index(int*a, int N) { int *aptr __attribute__ ((aligned(64))) = a; // This failed but is fixed by this patch. #pragma omp target teams distribute parallel for map(tofrom: aptr[0:N]) for(int i=0;i<N;i++) { aptr[i]=i; } }
Why does this handling need to be different between CPU and GPU offloading? Strictly speaking, I'm not sure why we need the alignment type here since we'd only get improper alignment on primitive types. So I figured that it should only care about the alignment of the type itself in all cases. Maybe someone can correct me on that.