This is an archive of the discontinued LLVM Phabricator instance.

[llvm][CUDA] Make NVVM Reflect pass look inside ptr casting
ClosedPublic

Authored by hdelan on Dec 2 2022, 10:42 AM.

Diff Detail

Event Timeline

hdelan created this revision.Dec 2 2022, 10:42 AM
hdelan requested review of this revision.Dec 2 2022, 10:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 2 2022, 10:42 AM
hdelan added subscribers: tra, bader.EditedDec 2 2022, 10:45 AM

Instead of adding an __nvvm_reflect clang builtin (https://reviews.llvm.org/D137154), it was decided that it is better to adjust the NVVMReflect pass so that __nvvm_reflect can be used when wrapped in pointer casting. This allows __nvvm_reflect to be used in openCL.

hdelan retitled this revision from Make NVVM Reflect pass look inside ptr casting to [llvm][CUDA] Make NVVM Reflect pass look inside ptr casting.Dec 2 2022, 10:47 AM
tra added a comment.Dec 2 2022, 10:51 AM

It would be great to add a test demonstrating what the code is intended to do.

tra added inline comments.Dec 2 2022, 10:55 AM
llvm/include/llvm/IR/InstrTypes.h
1407 ↗(On Diff #479675)

Does it need a type check as it's done in getCalledFunction() above after a similar cast?

hdelan added inline comments.Dec 2 2022, 11:02 AM
llvm/include/llvm/IR/InstrTypes.h
1407 ↗(On Diff #479675)

The type checking above is checking that the operands match for the function and the call. However because we are removing ptr casts the operands will be different for the call and the func with removed ptr casts

hdelan updated this revision to Diff 479696.Dec 2 2022, 11:32 AM
  • Adding test
tra added inline comments.Dec 2 2022, 12:00 PM
llvm/test/CodeGen/NVPTX/nvvm-reflect-cast.ll
16 ↗(On Diff #479696)

Interesting.

Wouldn't it be better/easier to generate ASC(4->0) of the string argument instead of casting the __nvvm_reflect type?
Or declare an overload of __nvvm_reflect() accepting the pointer in AS(4)?

I'm OK with this change, but I want to understand the full picture, and have the motivation/context for the change recorded. Otherwise it's not clear why one would need this at all.

hdelan added inline comments.Dec 2 2022, 12:09 PM
llvm/test/CodeGen/NVPTX/nvvm-reflect-cast.ll
16 ↗(On Diff #479696)

This isn't possible when dealing with llvm intrinsics that do not map to clang builtins. When the first __nvvm_reflect call occurs in a module, a declaration is made based on how it is called. This make a declaration for AS4, AS1 etc. Then if say __nvvm_reflect is called again in the same module from a different AS, ptr casting will be employed so that the existing declaration will work with the new addrspace of the parameter.

When using the clang builtin this doesn't happen since use of __nvvm_reflect builtin clearly maps to the llvm intrinsic, which isn't tied to a particular address space.

tra added inline comments.Dec 2 2022, 3:02 PM
llvm/test/CodeGen/NVPTX/nvvm-reflect-cast.ll
16 ↗(On Diff #479696)

When the first __nvvm_reflect call occurs in a module, a declaration is made based on how it is called

Are you saying that the code relies on implicit declaration of __nvvm_reflect and does not provide it explicitly?
What if someone calls __nvvm_reflect(1) ?

I do not see how the declaration which would be provided by compiler for __nvvm_reflect if it were a builtin would be different from providing such a declaration in the user code.

When using the clang builtin this doesn't happen since use of __nvvm_reflect builtin clearly maps to the llvm intrinsic, which isn't tied to a particular address space.

I'm not sure what does it have to do with the intrinsics at all. All you need is a declaration for a function extern "C" int__nvvm_reflect(const char*);
https://godbolt.org/z/MoqMYfjcx

My understanding was that the problem you needed to solve is that constant strings lived in AS(4) so the only piece missing is ability of NVVMReflect pass to look through addrspacecast AS(4 or whatever) ->AS(0) which would be wrapping the string constant in your case.

What do I miss?

hdelan added inline comments.Dec 2 2022, 3:02 PM
llvm/test/CodeGen/NVPTX/nvvm-reflect-cast.ll
16 ↗(On Diff #479696)

And we cannot do any casting within openCL since strings must be in the __constant address space

hdelan added inline comments.Dec 2 2022, 3:13 PM
llvm/test/CodeGen/NVPTX/nvvm-reflect-cast.ll
16 ↗(On Diff #479696)

A clarification: The code would rely on some extern "C" __nvvm_reflect() declaration in user code. The problem is that in openCL strings are in the __constant address space, so in order to use __nvvm_reflect you need to declare extern "C" __nvvm_reflect(__constant char *), whereas in non openCL code you can declare extern "C" __nvvm_reflect(char *).

Whichever IR declaration appears first in the module will be taken as the one declaration of __nvvm_reflect something that would look like:
declare dso_local i32 @__nvvm_reflect(i8* noundef).

When another TU makes use of __nvvm_reflect from a different address space, CodeGen will attempt to make this other extern "C" declaration fit with the pre existing IR declaration, which necessitates ptr casting either from addrspace. This generates a call matching this line here in the test. In order for __nvvm_reflect to work as desired we want to get rid of this pointer casting.

hdelan added inline comments.Dec 2 2022, 3:14 PM
llvm/test/CodeGen/NVPTX/nvvm-reflect-cast.ll
16 ↗(On Diff #479696)

Sorry they should be extern "C" int __nvvm_reflect(...)

tra added inline comments.Dec 2 2022, 3:16 PM
llvm/test/CodeGen/NVPTX/nvvm-reflect-cast.ll
16 ↗(On Diff #479696)

Nothing stops you declaring nvvm_reflect with a __constant argument: https://godbolt.org/z/jTbvcsa3a
NVVM reflect does not care as long as we can get the string value.

tra added inline comments.Dec 2 2022, 4:10 PM
llvm/test/CodeGen/NVPTX/nvvm-reflect-cast.ll
16 ↗(On Diff #479696)

So the problem is:

  • OpenCL has string constants in its own AS and can't cast it to generic AS (I wasn't aware of that).
  • Declaring OpenCL-specific __nvvm_reflect(__constant char*) variant makes compilation work for OpenCL w/o having to add any casts.
  • However, it results in different signatures on LLVM IR level,
  • Different signatures result in necessity of typecasting between the two __nvvm_reflect() variants.
  • and that's why you had to strip casts from the function, and not the argument.

OK. I think it mostly makes sense now.

So, if __nvvm_reflect were a builtin, accepting const char *, how would it work with OpenCL? You would still not be allowed to pass it a string constant, right?

Can we side-step the problem by using extern "C" int __nvvm_reflect(...) everywhere? This way compiler will happily pass whatever pointer you give it, though the downside will be that there will be no diagnostics if you try to pass is a weird argument. It also only works for C++ variant of OpenCL. Otherwise variadic function would still need an argument.

Having mismatched __nvvm_reflect IR declarations and resulting casts is probably benign, given that it can never be a real function and the casts will go away after NVVMReflect is done.

How about if we side-step the problem altogether and use something as simple as this: https://godbolt.org/z/oY7coWz35

int __nvvm_reflect(const char *);

int wrapper(__constant const char *arg) {
    uintptr_t p = (uintptr_t)arg;
    return __nvvm_reflect((const char *) p);
}
kernel void k(global int* in1, global int* in2, global int* out) {
     out[0] = wrapper("__CUDA_ARCH");
}
hdelan added inline comments.Dec 4 2022, 3:47 PM
llvm/test/CodeGen/NVPTX/nvvm-reflect-cast.ll
16 ↗(On Diff #479696)

This kind of pointer casting that you attach above generates the IR:

tail call i32 @__nvvm_reflect(i8* noundef inttoptr (i64 ptrtoint ([12 x i8] addrspace(4)* @.str5 to i64) to i8*))

Which forces the assertion in NVVMReflectPass to fail:

assert(isa<ConstantDataSequential>(Operand))

So the logic of the pass will need to change either to accommodate your proposed approach, or to allow for the casting to be removed from around the function call, as I propose.

hdelan added inline comments.Dec 6 2022, 9:15 AM
llvm/test/CodeGen/NVPTX/nvvm-reflect-cast.ll
16 ↗(On Diff #479696)

Which approach do we think is preferable? Allowing __nvvm_reflect to be declared as an extern "C" func with any addrspace const char * parameter, or changing the pass to allow __nvvm_reflect to take non a ConstantDataSequential string (as is made happen with pointer casting)?

tra added inline comments.Dec 6 2022, 10:09 AM
llvm/test/CodeGen/NVPTX/nvvm-reflect-cast.ll
16 ↗(On Diff #479696)

tail call i32 @__nvvm_reflect(i8* noundef inttoptr (i64 ptrtoint ([12 x i8] addrspace(4)* @.str5 to i64) to i8*))

That was the idea.

assert(isa<ConstantDataSequential>(Operand))
So the logic of the pass will need to change either to accommodate your proposed approach, o

Correct. We'll need to dig through to the string constant. I think this is a marginally better approach compared to ending up with different declarations for __nvvm_reflect.

I wonder if there's a way to get bitcast instead of inttotr/ptrtoint in the source code. Bitcast would be better (less ops to see through), but I'm not sure how to get it in C++ or opencl.

Also, after tinkering with this approach for a bit, I've realized that ptrtoint->inttoptr has another issue -- there may be additional loads/stores in-between that we'll also need to deal with. https://godbolt.org/z/cWTvf33fY

It's probably doable, but it may not be worth complicating things just for that.
I guess the simplest thing we can do is to introduce __nvvm_reflect_ocl() which we can then declare with an argument in AS(4) without conflict with the signature for the regular __nvvm_reflect().

So, my preferences would be, in order:

  • allow __nvvm_reflect() with arguments bitcast'ed or ASC'ed from constants in other AS'es, if we can figure out how to produce bitcast from OpenCL code.
  • introduce __nvvm_reflect_ocl(...) with an argument in AS(4) so it works with OpenCL strings.
  • allow __nvvm_reflect() with arguments behind ptrtoint/inttoptr and, possibly, intermediate stores/loads
hdelan updated this revision to Diff 482780.Dec 14 2022, 3:12 AM

Changing the approach to define __nvvm_reflect_ocl which can take a const char * in
AS(4) so it can be used in openCL.

hdelan updated this revision to Diff 482782.Dec 14 2022, 3:13 AM

Change description in test

hdelan updated this revision to Diff 482784.Dec 14 2022, 3:16 AM

Changing NVVMReflect pass to allow for nvvm_reflect_ocl to be substituted out,
which allows for using
nvvm_reflect_ocl with an AS(4) const char * param.

Hi @tra , after some unsuccessful attempts at changing ptr casting to bitcasting, I have implemented your second preference. That is introduce __nvvm_reflect_ocl(...) with an argument in AS(4) so it works with OpenCL strings. Let me know your thoughts

tra accepted this revision.Dec 14 2022, 9:58 AM

LGTM. Thank you for trying all these different variants and helping me understand the requirements and constraints you have on the OpenCL side.

This revision is now accepted and ready to land.Dec 14 2022, 9:58 AM

Fantastic thanks for your help. I do not have permission to commit so could you please commit this patch on my behalf? Thanks

tra added a comment.Dec 14 2022, 10:26 AM

I do not have permission to commit so could you please commit this patch on my behalf? Thanks

Will do.

tra added a comment.Dec 14 2022, 10:27 AM

Should I attribute the commit to your email @codeplay.com ? Or do you prefer something else?

Should I attribute the commit to your email @codeplay.com ? Or do you prefer something else?

Yes that would be great. Thank you

tra added a comment.Dec 14 2022, 10:49 AM

This patch does not apply cleanly to my sources. Is it based on the public LLVM tree? Can you rebase it on top of the main branch?

hdelan added a comment.Jan 3 2023, 1:53 AM
This comment was removed by hdelan.
hdelan updated this revision to Diff 485968.Jan 3 2023, 5:36 AM
hdelan removed subscribers: yaxunl, bader, hiraditya and 4 others.

Clean diff

nikic added a subscriber: nikic.Jan 3 2023, 5:49 AM
nikic added inline comments.
llvm/test/CodeGen/NVPTX/nvvm-reflect-ocl.ll
11

Please use opaque pointers for new tests.

hdelan updated this revision to Diff 485999.Jan 3 2023, 7:58 AM

Using opaque pointers

tra added a comment.Jan 3 2023, 11:43 AM

LGTM. I'll attempt to land it later today.

Great thanks.

This revision was landed with ongoing or failed builds.Jan 4 2023, 12:03 PM
This revision was automatically updated to reflect the committed changes.