Page MenuHomePhabricator

[CUDA] Change initializer for CUDA device code based on CUDA documentation.
ClosedPublic

Authored by wengxt on Aug 21 2015, 9:31 AM.

Details

Summary

According to CUDA documentation, global variables declared with device,
constant can be initialized from host code, so mark them as
externally initialized. Because shared variables cannot have an
initialization as part of their declaration and since the value maybe kept
across different kernel invocation, the value of shared is effectively
undefined instead of zero initialized.

Wrongly using zero initializer may cause illegitimate optimization, e.g.
removing unused constant variable because it's not updated in the device
code and the value is initialized with zero.

Test Plan: test/CodeGenCUDA/address-spaces.cu

Patch by Xuetian Weng

Diff Detail

Event Timeline

wengxt updated this revision to Diff 32834.Aug 21 2015, 9:31 AM
wengxt retitled this revision from to [CUDA] Change initializer for CUDA device code based on CUDA documentation..
wengxt updated this object.
wengxt added reviewers: jingyue, jholewinski.
wengxt added a subscriber: llvm-commits.
jingyue edited edge metadata.Aug 21 2015, 10:03 AM

Looks good in general.

According to CUDA documentation, global variable with device,

global variables

externally initialized. For shared variable, its value might be kept
by GPU Memory among different kernel invocation and it is not possible
to initialize it. So change shared variable it to undefined value.

Just say: shared variables cannot have an initialization as part of their declaration. Whether being kept across invocations is undefined.

lib/CodeGen/CodeGenModule.cpp
1998

Then, this should be a hard error instead of an unsupported feature, right? ErrorUnsupported sounds like we want to support it later.

tra edited edge metadata.Aug 21 2015, 10:23 AM

+1 to jingyue's comments.

lib/CodeGen/CodeGenModule.cpp
1995

please use getLangOpts()

2088–2098

Cosmetic nit: no closing quote marks in both citations.

You may want to leave only one copy if "is accessible..." as it applies to both B.2.1 and B.2.2.

Just say: shared variables cannot have an initialization as part of their declaration. Whether being kept across invocations is undefined.

My intention is to explain that why shared varible should be undefined instead of zero-initialized. Being unable to have initialization as declaration does not assume that it may contain random value.

OK. Then some rewording needs to be done.

The reason shared variables should be initialized as undef is that CUDA's language manual explicitly says that it can't be initialized.

The reason they shouldn't be initialized as zero is sort of what you explained in the second paragraph: compilers might perform optimizations that leverage the zero initializer. However, such optimizations are illegimate because, at runtime, data previously stored to shared memory might be left over.

Is this more accurate?

wengxt updated this revision to Diff 32856.Aug 21 2015, 1:09 PM
wengxt edited edge metadata.

update based on comments.

wengxt updated this revision to Diff 32858.Aug 21 2015, 1:26 PM
wengxt marked 2 inline comments as done.

make shared with initialization a hard error.

wengxt marked an inline comment as done.Aug 21 2015, 1:26 PM
wengxt updated this object.Aug 21 2015, 1:27 PM
jingyue accepted this revision.Aug 21 2015, 1:27 PM
jingyue edited edge metadata.

LGTM

This revision is now accepted and ready to land.Aug 21 2015, 1:27 PM
jingyue updated this object.Aug 21 2015, 1:28 PM
jingyue edited edge metadata.
jingyue closed this revision.Aug 21 2015, 10:50 PM
tra added a comment.Aug 31 2015, 3:25 PM

Following code compiles with nvcc, but errors out with clang and this patch due to InitExpr check in CodeGenModule.cpp. It looks like the check needs to be more selective.

struct c {
  c() {}
};

__shared__ c var;
tra added a comment.Aug 31 2015, 3:56 PM
In D12241#236830, @tra wrote:

Following code compiles with nvcc, but errors out with clang and this patch due to InitExpr check in CodeGenModule.cpp. It looks like the check needs to be more selective.

struct c {
  c() {}
};

__shared__ c var;

Looks like we don't even need explicit constructor to fail:

struct c { };
__shared__ c var;

I believe it's something that we *do* want to support.