Page MenuHomePhabricator

[IR] Add an opaque type to LLVM.
Needs ReviewPublic

Authored by jcranmer-intel on Oct 4 2022, 2:54 PM.

Details

Summary

Opaque types represent types that need to be preserved through optimization, but
otherwise are not introspectable by target-independent optimizations. This patch
doesn't add any uses of these opaque types by an existing backends, it only
provides basic infrastructure such that these types would work correctly.

RFC: https://discourse.llvm.org/t/rfc-adding-opaque-types-to-llvm-ir/65326

Diff Detail

Event Timeline

jcranmer-intel created this revision.Oct 4 2022, 2:54 PM
Herald added a project: Restricted Project. · View Herald Transcript
jcranmer-intel requested review of this revision.Oct 4 2022, 2:54 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 4 2022, 2:54 PM
nikic edited the summary of this revision. (Show Details)Oct 5 2022, 1:01 AM
arichardson added inline comments.Oct 5 2022, 1:10 AM
llvm/include/llvm/IR/DataLayout.h
677

Why is the size hardcoded to an as0 pointer? Wouldn't it make more sense to e.g. have the first type parameter define the underlying size?

nikic added a comment.Oct 5 2022, 1:57 AM

The implementation looks reasonable. In terms of design, I basically have the same concern as everyone else: Hardcoding the layout to ptr seems pretty dubious. The suggestion to specify the layout type as a parameter seems pretty sensible.

Less importantly, I wonder whether the integer parameters should be represented as APInt rather than 32-bit ints.

llvm/docs/BitCodeFormat.rst
1342

nit: You need moar ^

llvm/docs/LangRef.rst
3613

Something that's a bit confusing is that we have both "opaque type" and "opaque structure type", the former spelled %x = type opaque, the latter spelled opaque("x"). This seems rather unfortunate.

I think a viable way forward would be to keep the old notion of opaque types, but move away from the "opaque" terminology for them. Effectively, an opaque struct type is just an identified struct type with zero members.

3638

There should be a mention of the type size and alignment somewhere.

llvm/include/llvm/IR/DerivedTypes.h
758

You can likely = None these and don't need a separate overload.

llvm/lib/IR/Constants.cpp
1093

This looks pretty dubious. We probably shouldn't be modelling opaque types using ConstantAggregateZero, as it is not an aggregate type.

Is zeroinitializer support actually required?

llvm/lib/IR/Function.cpp
917

This must be unique per type, so you need to include all the nested information as well. You can add tests using the ssa.copy intrinsic.

llvm/lib/IR/LLVMContextImpl.h
213

return Name == that.Name && TypeParams == that.TypeParams && IntParams == that.IntParams?

llvm/lib/Transforms/Scalar/SROA.cpp
1723

Does CastInst::isBitCastable need an adjustment?

nhaehnle added inline comments.Oct 5 2022, 3:43 AM
llvm/docs/LangRef.rst
3613

Thinking about this a bit more, I think it would be good to move away from the "opaque" naming entirely and instead call these new types something like an "extension" type or a "generic" type.

It's a bit of a philosophical point, but the types being added here aren't necessarily opaque in a universal sense. They are opaque to core IR instructions, but they are not opaque to e.g. intrinsics that are added specifically for working with them.

(And the same is true for the "opaque" structs as well. "Identified struct type with zero members" is what they are, so why not just run with it.)

jcranmer-intel added a comment.EditedOct 7 2022, 12:28 PM

Sorry for the delay in responding to comments, I've been trying to get the subsequent changes that use for SPIR-V polished.

The implementation looks reasonable. In terms of design, I basically have the same concern as everyone else: Hardcoding the layout to ptr seems pretty dubious. The suggestion to specify the layout type as a parameter seems pretty sensible.

As pointed out, specifying an underlying type isn't necessarily feasible. But I could add optional type properties that would include size and alignment to start with (and have those default to sizeof(ptr) and size, respectively)--how does that sound?

Less importantly, I wonder whether the integer parameters should be represented as APInt rather than 32-bit ints.

I've considered this as well. But I strongly doubt that there's a need to go up that high at this point, and nothing in the current design blocks revisiting this should we need to in the future (if my understanding of bitcode is correct), so I'm inclined to keep it as-is for now.

llvm/docs/LangRef.rst
3613

"Opaque types" has some precedent (it's what the SPIR-V specification calls these types), but I suspect that terminology derives from opaque types, given the role of LLVM in SPIR's history.

%x = type opaque and opaque("x") being similar is indeed very unfortunate. For a while, I've been thinking that opaque pointers themselves make opaque structs obsolete--there's very little you can do on them once a pointer to them is indistinguishable from any other pointer--which would help remove the confusion.

As far a better name, I'm having a tough time coming up with a short name. Something with "target" in it I think captures the semantics well, but "target type" feels insufficient, and "target-extension type" or "target-specified type" a little wordy. But moving to a target("spirv.Image", void, 2, 0, 0, 0, 0, 0, 1) syntax (to use an actual use of this for SPIR-V) I think reads slightly better than using opaque.

3638

Noted, will fix.

llvm/lib/IR/Constants.cpp
1093

Yes, for at least some of the types. (https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpConstantNull lists the opaque types for which something like null exists).

I agree that ConstantAggregateZero is somewhat dubious as a class, but it seemed more fitting than any other class, not least of which being the fact that it causes zeroinitializer to work without any extra effort.

I'm open to suggestions as to what to name a new zeroinitializer class for opaque pointers--I don't have any great names off my head.

llvm/lib/IR/Function.cpp
917

Good catch--I had originally coded this up without any type/integer parameters, and missed the need to change this location as well. Will fix!

llvm/lib/Transforms/Scalar/SROA.cpp
1723

It turns out not to need it--CastInst::isBitCastable returns false if Type::getPrimitiveSizeInBits returns 0 for either parameter, and Type::getPrimitiveSizeInBits returns 0 for opaque types.

(This made it easy to confirm when optimizations caused problems--they tried to create bitcast instructions with opaque types, and that failed.) I can add an extra check for safety.

nhaehnle added inline comments.Oct 10 2022, 12:39 AM
llvm/docs/LangRef.rst
3613

How about ext(...)? But target(...) is fine with me, too.

llvm/lib/IR/Constants.cpp
1093

Would an intrinsic that returns the null value work? Or are null initializers also allowed in global variables?

Here's a random thought: what if the types are parameterized using metadata instead of ints?

Here's a random thought: what if the types are parameterized using metadata instead of ints?

My original thought was using ConstantInt, actually, and that should in principle be relatively easy to parse. Getting the integer value out of metadata is a pretty circuitous route, and if you want arbitrary-sized integers, ConstantInt or an APInt are much simpler routes.

One downside with ConstantInt is that it implies the *size* of the integer matters for distinguishing types (e.g., ext("a", i32 10) is not the same as ext("a", i64 10)). That makes APInt seem a better solution to me than ConstantInt, but I'm not entirely certain the extra flexibility over a bare unsigned is necessary--if it does turn out to be the case, upgrading unsigned to APInt could still be done without substantial changes to syntax or API.

llvm/lib/IR/Constants.cpp
1093

Global variable null initializers are I believe necessary.

nikic added a comment.Oct 15 2022, 2:37 AM

Sorry for the delay in responding to comments, I've been trying to get the subsequent changes that use for SPIR-V polished.

The implementation looks reasonable. In terms of design, I basically have the same concern as everyone else: Hardcoding the layout to ptr seems pretty dubious. The suggestion to specify the layout type as a parameter seems pretty sensible.

As pointed out, specifying an underlying type isn't necessarily feasible.

I couldn't find where this has been pointed out, could you please link me to it? I don't really get why it wouldn't be feasible, given that your current proposal effectively already does it, just with a hardcoded ptr type.

But I could add optional type properties that would include size and alignment to start with (and have those default to sizeof(ptr) and size, respectively)--how does that sound?

That would be fine as well, though probably harder to implement.

Here's a random thought: what if the types are parameterized using metadata instead of ints?

I was originally going to suggest to parameterize them by Constants. But then I realized that this would add a cyclic dependency between values and types. Currently values only use types, but not the other way around. It's probably possible to support this, but it seems like a very large complication that's unlikely to be justified. The same concern holds for metadata (probably to an even larger degree).

nhaehnle added a comment.EditedOct 17 2022, 2:27 AM

Here's a random thought: what if the types are parameterized using metadata instead of ints?

I was originally going to suggest to parameterize them by Constants. But then I realized that this would add a cyclic dependency between values and types. Currently values only use types, but not the other way around. It's probably possible to support this, but it seems like a very large complication that's unlikely to be justified. The same concern holds for metadata (probably to an even larger degree).

Yeah, that's a fair point.

Fix some of the nits.

This does not address the name concerns or size/align/underlying type concerns
yet.

jcranmer-intel marked 4 inline comments as done.Oct 17 2022, 8:54 AM

I couldn't find where this has been pointed out, could you please link me to it? I don't really get why it wouldn't be feasible, given that your current proposal effectively already does it, just with a hardcoded ptr type.

https://discourse.llvm.org/t/rfc-adding-opaque-types-to-llvm-ir/65326/10 was what I was thinking of, but looking back, it's not as thorough as I thought it was.

The broad issue with opaque types in general is that there's not necessarily an underlying type. For SPIR-V, it's the case that these types almost always [1] ultimately lower into a ptr (of some address space) or a i32 type (or i32-stored-as-a-ptr), so pretending that these types are ptr is a viable strategy--indeed, this is what the current SPIR-V target lowering does. I don't know @asb's WASM use case very well, but I imagine treating opaque types as a ptr in a trenchcoat would similarly ultimately work correctly.

Look past these use cases, however, and underlying types get murkier. It's true that I didn't account for these yet, but that was more a matter of punting on design decisions that didn't need addressing for this specific use case. If x86_mmx and x86_amx were implemented as opaque types [2], their underlying type is murkier--perhaps vectors of 64 bit and 8,192 bit respectively, but what vector element type would you pick for them? Some of the exotic hardware I've worked on (whose details are unfortunately not public) could find an opaque type useful, but those types can't find any underlying type that would be useful--something like opaque("hw", float) would mean "you can get a float from me", but treating it as a float in a trenchcoat would be absolutely catastrophic.

My biggest worry with trying to use underlying types is that it would encourage people to think of opaque types as "it's really just a T" when the entire point of opaque types is that it has to be treated differently from a T.

[1] Read as "this is the case in every backend I checked", which is admittedly a very far cry from "all of them". Although the chosen address space does vary between different targets, ptr addrspace(0) is sufficiently pessimistic for size/alignment information.
[2] I'm not proposing doing this, but they do provide some useful checks for what might be relevant.

But I could add optional type properties that would include size and alignment to start with (and have those default to sizeof(ptr) and size, respectively)--how does that sound?

That would be fine as well, though probably harder to implement.

Type properties are already something I've considered necessary anyways, and size/alignment is a very natural place to start; I didn't include them in this patch principally because the code I had didn't need them yet (as relying on the default being ptr was sufficient for my needs, although I now see that it caused more reviewer confusion than I anticipated).

ychen added a subscriber: ychen.Oct 22 2022, 10:31 PM

Major update.

The big change is the great renaming from "opaque type" to "target extension
type". There's still a few places that haven't had there name updated yet. Fixes
to documentation are still needed as well.

In addition, this creates a notion of type properties that is used to fix some
of the issues mentioned above. There is a major unresolved question in this
update as to where the logic to get the type properties of a target extension
type should be located. I see 4 options:

  1. Specified during type creation.
  2. Attach to the module akin to existing identified struct type logic.
  3. Attach to DataLayout via the datalayout string or something similar.
  4. Lookup table built-in to LLVM.

[Note that the current implementation, a static function in Type.cpp, is not any
of these options, but it is most similar to option 4, although I would replace
that with something tablegen-driven most likely, like existing Attributes.td.]

These options are laid out from most to least flexible. While I'm not entirely
happy with how option 4 requires modification to LLVM to effectively add new
target extension types, it would make it about as difficult to add a new type as
it would be to add a new attribute, which seems appropriate.

jcranmer-intel marked 4 inline comments as done.Nov 3 2022, 1:49 PM
nikic added a comment.Nov 4 2022, 2:05 PM

In addition, this creates a notion of type properties that is used to fix some
of the issues mentioned above. There is a major unresolved question in this
update as to where the logic to get the type properties of a target extension
type should be located. I see 4 options:

  1. Specified during type creation.
  2. Attach to the module akin to existing identified struct type logic.
  3. Attach to DataLayout via the datalayout string or something similar.
  4. Lookup table built-in to LLVM.

[Note that the current implementation, a static function in Type.cpp, is not any
of these options, but it is most similar to option 4, although I would replace
that with something tablegen-driven most likely, like existing Attributes.td.]

These options are laid out from most to least flexible. While I'm not entirely
happy with how option 4 requires modification to LLVM to effectively add new
target extension types, it would make it about as difficult to add a new type as
it would be to add a new attribute, which seems appropriate.

A hard constraint here is that we need to be able to determine type layout from just the Type and DataLayout, so I don't think that anything involving the Module is possible.

TBH, I think your current approach of just hardcoding the type properties is fine. We can extend this to a more dynamic solution later if needed. It might make sense to compute the properties on construction though and store them in the type (which would then also be the entry-point to a more dynamic solution, by accepting these as arguments). I don't think we need TableGen for this either, as there's no need to generate multiple tables from one base data set.

llvm/include/llvm-c/Types.h
68 ↗(On Diff #473023)

This doesn't look right...

llvm/include/llvm/IR/Type.h
291

Should this check whether the layout type is sized? (I.e., does it make sense to also support unsized target types?)

Doc tweaks, name tweaks, add some verification code.

Rebase on trunk.

Friendly review ping?