Page MenuHomePhabricator

[OPENMP]Fix PR49649: The introduction of $ref globals is not always valid.
Needs ReviewPublic

Authored by ABataev on Mar 25 2021, 8:52 AM.

Details

Summary

Need to emit the reference in the global address space and the
referenced to static variable should be bitcasted to the generic address
space to be compatible with NVPTX.
Required to avoid early optimization of the static variables

Event Timeline

ABataev created this revision.Mar 25 2021, 8:52 AM
ABataev requested review of this revision.Mar 25 2021, 8:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 25 2021, 8:52 AM

The cast was never the problem but the fact that the $ref lives in the global address space while the global value might be in the shared one.
D101030 gives a way out here by not creating $ref variables if there is no host version of a global. This makes sense because we don't need to copy to the global ever.
So, users can have static globals in shared memory by ensuring they are not host accessible. We probably should emit an error if they try to do global in shared memory
that has a host version.

Long story short, this cast won't help.

The cast was never the problem but the fact that the $ref lives in the global address space while the global value might be in the shared one.
D101030 gives a way out here by not creating $ref variables if there is no host version of a global. This makes sense because we don't need to copy to the global ever.
So, users can have static globals in shared memory by ensuring they are not host accessible. We probably should emit an error if they try to do global in shared memory
that has a host version.

Long story short, this cast won't help.

I checked that this cast helps to fix a problem with ptxas with shared/global memory refs. I'm not saying that this is the best solution, but better to have a fix while the final solution is not landed. Plus. maybe, it will fix some other potential issues until the $refs will go away. Maybe need a new attribute to avoid early optimization of the internal vars.

I don't understand why the cast makes ptxas happy, can you include a test in the runtime so we can make sure that stays that way and is not some ptxas artifact. After all, we still point to a shared memory symbol from global memory which doesn't make too much sense.

I don't understand why the cast makes ptxas happy, can you include a test in the runtime so we can make sure that stays that way and is not some ptxas artifact. After all, we still point to a shared memory symbol from global memory which doesn't make too much sense.

Here is the related info https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#initializers. Take a look at the Examples:

      .const  .u32 foo = 42;
...
      .global .u32 p2 = generic(foo); // generic address of foo

After the patch, we have this:

.shared .align 4 .u32 _ZL1X;
.global .align 8 .u64 __ZL1X$ref = generic(_ZL1X);

which is similar to the example.

ABataev updated this revision to Diff 360577.Jul 21 2021, 1:15 PM

Added runtime test.

Herald added a project: Restricted Project. · View Herald TranscriptJul 21 2021, 1:15 PM

Better to fix this before the next release.