This is an archive of the discontinued LLVM Phabricator instance.

[clang][WIP] add option to keep types of ptr args for non-kernel functions in metadata
Needs ReviewPublic

Authored by iliya-diyachkov on Jun 11 2022, 2:26 PM.

Details

Reviewers
Anastasia
bader
svenvh
nikic
Group Reviewers
Restricted Project
Summary

It's a draft for the patch we discussed in the thread of opaque pointer support in the SPIR-V translator. The final decision on whether we will use this approach has not yet been made, so the patch is uploaded for convenience and concertizing of the discussion.

The types are already in metadata for kernel functions by default. Alexey Bader shared the code example with non-kernel functions which lost the info. The patch fixes this case. To enable the feature you need to pass -cl-extra-ptr-info to clang.

TODO:

  • emit only ptr type info metadata if EmitOpenCLExtraPtrInfo is enabled,
  • maybe add the same info to function declarations,
  • maybe add the same options to other languages (SYCL, HLSL ...) or make one common option for all languages.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptJun 11 2022, 2:26 PM
Herald added a subscriber: Naghasan. · View Herald Transcript
iliya-diyachkov requested review of this revision.Jun 11 2022, 2:26 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 11 2022, 2:26 PM

The patch seems very straight forward. I wonder though if we should always enable this for SPIR-V w/o adding any flag until the untyped pointers are added to SPIR-V so it will become a target setting controlled instead of a flag?

Also is there anything that we don't need from the metadata that are currently emitted for the use case i.e. for example argument names? This could reduce the number of metadata emitted...

I am not sure if this is something that other languages or backends might also need in which case we might need to make this more universal... Ideally it would be nice if it was not in metadata as it is somewhat optional... but potentially we don't have many other alternatives?

beanz added a reviewer: nikic.Jun 15 2022, 7:44 AM
beanz added a comment.Jun 15 2022, 7:47 AM

There are numerous changes floating around that boil down to "we need pointer types". Most are SPIR-V & DXIL related. I'm a little concerned that we are generally building a bunch of different approaches to tracking pointer types and not considering a more general solution that could meet all of the needs.

nikic added a reviewer: Restricted Project.Jun 15 2022, 7:54 AM

Could you please add some context directly to the patch description that explains why this is necessary? Being unfamiliar with SPRIV, I don't really get what is significant about the code example.

beanz added a comment.Jun 15 2022, 7:58 AM

@nikic the most important thing you need to know about SPIR-V is that it is a virtual ISA based on LLVM IR. The ISA itself encodes types for pointers just like LLVM IR would.

nikic added a comment.EditedJun 15 2022, 8:05 AM

@nikic the most important thing you need to know about SPIR-V is that it is a virtual ISA based on LLVM IR. The ISA itself encodes types for pointers just like LLVM IR would.

Okay ... I guess my next question would be how the SPIR-V situation differs from the DXIL situation. For DXIL you have already implemented code to "guess" pointer types insofar as they are relevant -- does SPIR-V need something more than that?

Clang is only permitted to encode pointer type information if that pointer type has some kind of direct semantic meaning -- say, if the pointer element type affects the call ABI in some way. If it does not have direct semantic meaning, then deriving the pointer type based on usage (as DXIL does) and using that for emission would be right approach.

bader added a comment.Jun 15 2022, 8:10 AM

@nikic the most important thing you need to know about SPIR-V is that it is a virtual ISA based on LLVM IR. The ISA itself encodes types for pointers just like LLVM IR would.

And in addition to that ISA defines types, which are not natively supported by LLVM IR e.g. image. To represent those types clang in OpenCL language mode emits a pointer to an opaque structure with special name like opencl.<ISA_type_name> (e.g. opencl.image2d_t). All ISA types, which are defined that way look the same with type-less pointers.
https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Basic/OpenCLImageTypes.def

@nikic the most important thing you need to know about SPIR-V is that it is a virtual ISA based on LLVM IR. The ISA itself encodes types for pointers just like LLVM IR would.

Okay ... I guess my next question would be how the SPIR-V situation differs from the DXIL situation. For DXIL you have already implemented code to "guess" pointer types insofar as they are relevant -- does SPIR-V need something more than that?

Clang is only permitted to encode pointer type information if that pointer type has some kind of direct semantic meaning -- say, if the pointer element type affects the call ABI in some way. If it does not have direct semantic meaning, then deriving the pointer type based on usage (as DXIL does) and using that for emission would be right approach.

I guess the problem might be that we can't always derive the type, for example if we have a translation unit:

extern void foo(int * ptr);
kernel void k(int * ptr) {
  foo(ptr);
}

then if in another translation module you have something like:

void foo(int * ptr){
  *ptr = 1;
}

For this program foo will translate into SPIR-V with different function prototype because in the first unit the type can't be deduced or deduced to some default type but in the second case it is deduced to the exact type. So the functions in two modules will have two different prototypes when mapped into SPIR-V and therefore linking won't be resolved?.. while in OpenCL sources it is just the same function foo.

@Anastasia Thanks, that does sound like a legitimate reason to include the information. I want to double check though, does linking the modules actually fail if the functions have signatures that differ only by pointer types? At least for normal LLVM IR this would work fine, and would just result in the insertion of a bitcast during linking (and then typically the bitcast would get shifted from the called function to the call arguments later).

And in addition to that ISA defines types, which are not natively supported by LLVM IR e.g. image. To represent those types clang in OpenCL language mode emits a pointer to an opaque structure with special name like opencl.<ISA_type_name> (e.g. opencl.image2d_t). All ISA types, which are defined that way look the same with type-less pointers.
https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Basic/OpenCLImageTypes.def

One of the important facts about these types is that the SPIR-V specification doesn't let you actually cast between these types and other types such as integers. (Which is an issue because I've just come across a testcase where, in opaque pointer mode, an OpenCL event type is being used in a ptrtoint to store to a variable as an i64, which I can't legally translate to SPIR-V.) Of course, this probably means that these types need to have their representation in LLVM changed entirely, but I haven't yet done the legwork to experiment in that mode.

I haven't yet tested whether or not this patch is sufficient to support all of the use cases I need for type scavenging for SPIR-V, but one of the things I do see is that there's no support for return type information, which may be a bit of an issue.

Also is there anything that we don't need from the metadata that are currently emitted for the use case i.e. for example argument names? This could reduce the number of metadata emitted...

Sure, I think we don't need all info. As I mentioned in TODO I'm going to leave only type info and remove other metadata emission. For this purpose I'll separate a new function (that emits the type info) from GenOpenCLArgMetadata.

I haven't yet tested whether or not this patch is sufficient to support all of the use cases I need for type scavenging for SPIR-V, but one of the things I do see is that there's no support for return type information, which may be a bit of an issue.

I think the return type information can be added in the next version of the patch, however it should be attached to function declarations (not definitions as it's done now), right? Do you think, declarations also require information about argument types?

Anastasia added a comment.EditedJun 16 2022, 5:25 AM

@Anastasia Thanks, that does sound like a legitimate reason to include the information. I want to double check though, does linking the modules actually fail if the functions have signatures that differ only by pointer types? At least for normal LLVM IR this would work fine, and would just result in the insertion of a bitcast during linking (and then typically the bitcast would get shifted from the called function to the call arguments later).

@nikic If I use spirv-link with two modules that have mismatching pointee type in a function parameter I get an error:

error: 0: Type mismatch on symbol "foo" between imported variable/function %6 and exported variable/function %17.

The way I understand a bitcast instruction in SPIR-V (OpBitcast in https://www.khronos.org/registry/SPIR-V/specs/unified1/SPIRV.html#_conversion_instructions) is that it can only apply to pointer types which are distinct from function types. Note that I believe that function pointers are illegal, at least we disallow them in OpenCL.

bader added a comment.Jun 16 2022, 5:43 AM

The way I understand a bitcast instruction in SPIR-V (OpBitcast in https://www.khronos.org/registry/SPIR-V/specs/unified1/SPIRV.html#_conversion_instructions) is that it can only apply to pointer types which are distinct from function types. Note that I believe that function pointers are illegal, at least we disallow them in OpenCL.

FYI: we are experimenting with function pointers on Intel HW programmed via SPIR-V. Extension draft - https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_function_pointers.asciidoc.

The way I understand a bitcast instruction in SPIR-V (OpBitcast in https://www.khronos.org/registry/SPIR-V/specs/unified1/SPIRV.html#_conversion_instructions) is that it can only apply to pointer types which are distinct from function types. Note that I believe that function pointers are illegal, at least we disallow them in OpenCL.

FYI: we are experimenting with function pointers on Intel HW programmed via SPIR-V. Extension draft - https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_function_pointers.asciidoc.

Ok, I see. That confirms I am guessing that in the standard SPIR-V binary we won't be able to apply LLVM IR's trick.

I think the return type information can be added in the next version of the patch, however it should be attached to function declarations (not definitions as it's done now), right? Do you think, declarations also require information about argument types?

Function declarations are more useful to get information than function definitions. A lot of the failures I'm currently seeing in my side of things appear to be due to not getting pointer types right when calling OpenCL/SYCL builtins causing interesting things to happen later on.

There's another approach I've considered taking, which is rather than using a metadata-based approach, emitting elementtype for any pointer-valued arguments [it doesn't solve return values, though]. This does require more changes to LLVM to let such attributes work on non-intrinsic functions, but it may end up working smoother than trying to reparse C types as LLVM types.

If I'm understanding correctly, previously the LLVM pointee type was used to represent a frontend OpenCL type. An alternative to marking everything with elementtype or adding metadata, can something be done in the frontend to mangle the function name with OpenCL pointee types, then the backend translates that to the proper SPIRV types?

If I'm understanding correctly, previously the LLVM pointee type was used to represent a frontend OpenCL type. An alternative to marking everything with elementtype or adding metadata, can something be done in the frontend to mangle the function name with OpenCL pointee types, then the backend translates that to the proper SPIRV types?

We were trying to get away from relying on mangled names in the backend as it is harder to target for non-clang and non-OpenCL based languages.

Clang is only permitted to encode pointer type information if that pointer type has some kind of direct semantic meaning -- say, if the pointer element type affects the call ABI in some way. If it does not have direct semantic meaning, then deriving the pointer type based on usage (as DXIL does) and using that for emission would be right approach.

@nikic what is the way for clang to encode the pointer type? Is it just generated regularly as before or using something like elementtype hint?

kuhar added a subscriber: kuhar.Jun 17 2022, 7:15 AM

Btw do we have any issue tracking this failure, I am having a feeling that the translation of special types might be related to https://github.com/llvm/llvm-project/issues/55121? I am not aware at which point libclc is expected to be linked for SPIR-V but if linking happens at SPIR-V binary level it is likely going to face the other problem I described here.

Either way we should track this problem and we probably need a solution ideally within a month or so, as we can't release clang-15 with broken SPIR-V support... potentially one solution is to enable pointer types just for SPIR-V but this is probably the least option we should consider.

nikic added a comment.Jun 20 2022, 1:07 AM

@Anastasia Thanks, that does sound like a legitimate reason to include the information. I want to double check though, does linking the modules actually fail if the functions have signatures that differ only by pointer types? At least for normal LLVM IR this would work fine, and would just result in the insertion of a bitcast during linking (and then typically the bitcast would get shifted from the called function to the call arguments later).

@nikic If I use spirv-link with two modules that have mismatching pointee type in a function parameter I get an error:

error: 0: Type mismatch on symbol "foo" between imported variable/function %6 and exported variable/function %17.

The way I understand a bitcast instruction in SPIR-V (OpBitcast in https://www.khronos.org/registry/SPIR-V/specs/unified1/SPIRV.html#_conversion_instructions) is that it can only apply to pointer types which are distinct from function types. Note that I believe that function pointers are illegal, at least we disallow them in OpenCL.

Okay ... can we maybe turn this around then? Always emit function parameters as i8* and bitcast them as needed, even if it is possible to guess a better type based on the definition? (Let's ignore the image type case here, which seems to have different requirements from the rest.)

@Anastasia Thanks, that does sound like a legitimate reason to include the information. I want to double check though, does linking the modules actually fail if the functions have signatures that differ only by pointer types? At least for normal LLVM IR this would work fine, and would just result in the insertion of a bitcast during linking (and then typically the bitcast would get shifted from the called function to the call arguments later).

@nikic If I use spirv-link with two modules that have mismatching pointee type in a function parameter I get an error:

error: 0: Type mismatch on symbol "foo" between imported variable/function %6 and exported variable/function %17.

The way I understand a bitcast instruction in SPIR-V (OpBitcast in https://www.khronos.org/registry/SPIR-V/specs/unified1/SPIRV.html#_conversion_instructions) is that it can only apply to pointer types which are distinct from function types. Note that I believe that function pointers are illegal, at least we disallow them in OpenCL.

Okay ... can we maybe turn this around then? Always emit function parameters as i8* and bitcast them as needed, even if it is possible to guess a better type based on the definition? (Let's ignore the image type case here, which seems to have different requirements from the rest.)

So where would bitcasts be emitted to reconstruct the function prototypes correctly?

nikic added a comment.Jun 21 2022, 2:15 AM

@Anastasia Thanks, that does sound like a legitimate reason to include the information. I want to double check though, does linking the modules actually fail if the functions have signatures that differ only by pointer types? At least for normal LLVM IR this would work fine, and would just result in the insertion of a bitcast during linking (and then typically the bitcast would get shifted from the called function to the call arguments later).

@nikic If I use spirv-link with two modules that have mismatching pointee type in a function parameter I get an error:

error: 0: Type mismatch on symbol "foo" between imported variable/function %6 and exported variable/function %17.

The way I understand a bitcast instruction in SPIR-V (OpBitcast in https://www.khronos.org/registry/SPIR-V/specs/unified1/SPIRV.html#_conversion_instructions) is that it can only apply to pointer types which are distinct from function types. Note that I believe that function pointers are illegal, at least we disallow them in OpenCL.

Okay ... can we maybe turn this around then? Always emit function parameters as i8* and bitcast them as needed, even if it is possible to guess a better type based on the definition? (Let's ignore the image type case here, which seems to have different requirements from the rest.)

So where would bitcasts be emitted to reconstruct the function prototypes correctly?

The bitcasts would be in the definition only, when the pointer arguments actually get used in a typed manner. The prototype would always include i8* arguments only.

@Anastasia Thanks, that does sound like a legitimate reason to include the information. I want to double check though, does linking the modules actually fail if the functions have signatures that differ only by pointer types? At least for normal LLVM IR this would work fine, and would just result in the insertion of a bitcast during linking (and then typically the bitcast would get shifted from the called function to the call arguments later).

@nikic If I use spirv-link with two modules that have mismatching pointee type in a function parameter I get an error:

error: 0: Type mismatch on symbol "foo" between imported variable/function %6 and exported variable/function %17.

The way I understand a bitcast instruction in SPIR-V (OpBitcast in https://www.khronos.org/registry/SPIR-V/specs/unified1/SPIRV.html#_conversion_instructions) is that it can only apply to pointer types which are distinct from function types. Note that I believe that function pointers are illegal, at least we disallow them in OpenCL.

Okay ... can we maybe turn this around then? Always emit function parameters as i8* and bitcast them as needed, even if it is possible to guess a better type based on the definition? (Let's ignore the image type case here, which seems to have different requirements from the rest.)

So where would bitcasts be emitted to reconstruct the function prototypes correctly?

The bitcasts would be in the definition only, when the pointer arguments actually get used in a typed manner. The prototype would always include i8* arguments only.

Ok, so does it mean that all pointer parameters in function definitions compiled for SPIR-V targets in clang would just be cast to the typed pointer? Then we only have untyped pointers in the declarations of functions? Perhaps I am missing something but would it not just be easier to keep the pointer types completely as it used to be, since it is going to be present nearly everywhere apart from declarations?