Page MenuHomePhabricator

[IR] Add a target extension type to LLVM.
ClosedPublic

Authored by jcranmer-intel on Oct 4 2022, 2:54 PM.

Details

Summary

Target-extension types represent types that need to be preserved through
optimization, but otherwise are not introspectable by target-independent
optimizations. This patch doesn't add any uses of these types by an existing
backend, it only provides basic infrastructure such that these types would work
correctly.

RFC: https://discourse.llvm.org/t/rfc-adding-opaque-types-to-llvm-ir/65326

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
nikic added a comment.Oct 15 2022, 2:37 AM

Sorry for the delay in responding to comments, I've been trying to get the subsequent changes that use for SPIR-V polished.

The implementation looks reasonable. In terms of design, I basically have the same concern as everyone else: Hardcoding the layout to ptr seems pretty dubious. The suggestion to specify the layout type as a parameter seems pretty sensible.

As pointed out, specifying an underlying type isn't necessarily feasible.

I couldn't find where this has been pointed out, could you please link me to it? I don't really get why it wouldn't be feasible, given that your current proposal effectively already does it, just with a hardcoded ptr type.

But I could add optional type properties that would include size and alignment to start with (and have those default to sizeof(ptr) and size, respectively)--how does that sound?

That would be fine as well, though probably harder to implement.

Here's a random thought: what if the types are parameterized using metadata instead of ints?

I was originally going to suggest to parameterize them by Constants. But then I realized that this would add a cyclic dependency between values and types. Currently values only use types, but not the other way around. It's probably possible to support this, but it seems like a very large complication that's unlikely to be justified. The same concern holds for metadata (probably to an even larger degree).

nhaehnle added a comment.EditedOct 17 2022, 2:27 AM

Here's a random thought: what if the types are parameterized using metadata instead of ints?

I was originally going to suggest to parameterize them by Constants. But then I realized that this would add a cyclic dependency between values and types. Currently values only use types, but not the other way around. It's probably possible to support this, but it seems like a very large complication that's unlikely to be justified. The same concern holds for metadata (probably to an even larger degree).

Yeah, that's a fair point.

Fix some of the nits.

This does not address the name concerns or size/align/underlying type concerns
yet.

jcranmer-intel marked 4 inline comments as done.Oct 17 2022, 8:54 AM

I couldn't find where this has been pointed out, could you please link me to it? I don't really get why it wouldn't be feasible, given that your current proposal effectively already does it, just with a hardcoded ptr type.

https://discourse.llvm.org/t/rfc-adding-opaque-types-to-llvm-ir/65326/10 was what I was thinking of, but looking back, it's not as thorough as I thought it was.

The broad issue with opaque types in general is that there's not necessarily an underlying type. For SPIR-V, it's the case that these types almost always [1] ultimately lower into a ptr (of some address space) or a i32 type (or i32-stored-as-a-ptr), so pretending that these types are ptr is a viable strategy--indeed, this is what the current SPIR-V target lowering does. I don't know @asb's WASM use case very well, but I imagine treating opaque types as a ptr in a trenchcoat would similarly ultimately work correctly.

Look past these use cases, however, and underlying types get murkier. It's true that I didn't account for these yet, but that was more a matter of punting on design decisions that didn't need addressing for this specific use case. If x86_mmx and x86_amx were implemented as opaque types [2], their underlying type is murkier--perhaps vectors of 64 bit and 8,192 bit respectively, but what vector element type would you pick for them? Some of the exotic hardware I've worked on (whose details are unfortunately not public) could find an opaque type useful, but those types can't find any underlying type that would be useful--something like opaque("hw", float) would mean "you can get a float from me", but treating it as a float in a trenchcoat would be absolutely catastrophic.

My biggest worry with trying to use underlying types is that it would encourage people to think of opaque types as "it's really just a T" when the entire point of opaque types is that it has to be treated differently from a T.

[1] Read as "this is the case in every backend I checked", which is admittedly a very far cry from "all of them". Although the chosen address space does vary between different targets, ptr addrspace(0) is sufficiently pessimistic for size/alignment information.
[2] I'm not proposing doing this, but they do provide some useful checks for what might be relevant.

But I could add optional type properties that would include size and alignment to start with (and have those default to sizeof(ptr) and size, respectively)--how does that sound?

That would be fine as well, though probably harder to implement.

Type properties are already something I've considered necessary anyways, and size/alignment is a very natural place to start; I didn't include them in this patch principally because the code I had didn't need them yet (as relying on the default being ptr was sufficient for my needs, although I now see that it caused more reviewer confusion than I anticipated).

ychen added a subscriber: ychen.Oct 22 2022, 10:31 PM

Major update.

The big change is the great renaming from "opaque type" to "target extension
type". There's still a few places that haven't had there name updated yet. Fixes
to documentation are still needed as well.

In addition, this creates a notion of type properties that is used to fix some
of the issues mentioned above. There is a major unresolved question in this
update as to where the logic to get the type properties of a target extension
type should be located. I see 4 options:

  1. Specified during type creation.
  2. Attach to the module akin to existing identified struct type logic.
  3. Attach to DataLayout via the datalayout string or something similar.
  4. Lookup table built-in to LLVM.

[Note that the current implementation, a static function in Type.cpp, is not any
of these options, but it is most similar to option 4, although I would replace
that with something tablegen-driven most likely, like existing Attributes.td.]

These options are laid out from most to least flexible. While I'm not entirely
happy with how option 4 requires modification to LLVM to effectively add new
target extension types, it would make it about as difficult to add a new type as
it would be to add a new attribute, which seems appropriate.

jcranmer-intel marked 4 inline comments as done.Nov 3 2022, 1:49 PM
nikic added a comment.Nov 4 2022, 2:05 PM

In addition, this creates a notion of type properties that is used to fix some
of the issues mentioned above. There is a major unresolved question in this
update as to where the logic to get the type properties of a target extension
type should be located. I see 4 options:

  1. Specified during type creation.
  2. Attach to the module akin to existing identified struct type logic.
  3. Attach to DataLayout via the datalayout string or something similar.
  4. Lookup table built-in to LLVM.

[Note that the current implementation, a static function in Type.cpp, is not any
of these options, but it is most similar to option 4, although I would replace
that with something tablegen-driven most likely, like existing Attributes.td.]

These options are laid out from most to least flexible. While I'm not entirely
happy with how option 4 requires modification to LLVM to effectively add new
target extension types, it would make it about as difficult to add a new type as
it would be to add a new attribute, which seems appropriate.

A hard constraint here is that we need to be able to determine type layout from just the Type and DataLayout, so I don't think that anything involving the Module is possible.

TBH, I think your current approach of just hardcoding the type properties is fine. We can extend this to a more dynamic solution later if needed. It might make sense to compute the properties on construction though and store them in the type (which would then also be the entry-point to a more dynamic solution, by accepting these as arguments). I don't think we need TableGen for this either, as there's no need to generate multiple tables from one base data set.

llvm/include/llvm-c/Types.h
68 ↗(On Diff #473023)

This doesn't look right...

llvm/include/llvm/IR/Type.h
277

Should this check whether the layout type is sized? (I.e., does it make sense to also support unsized target types?)

Doc tweaks, name tweaks, add some verification code.

Rebase on trunk.

Friendly review ping?

Matt added a subscriber: Matt.Dec 7 2022, 6:25 PM
nikic added inline comments.Dec 8 2022, 7:48 AM
llvm/lib/AsmParser/LLParser.cpp
2982

nit: Can drop explicit SmallVector sizes here.

llvm/lib/IR/Type.cpp
852

It looks like the HasZeroInit property exists, but is never actually checked anywhere? Should either drop it or check it.

855

Probably want to drop this one before landing, as it's part of a different RFC.

858

I'd probably go for defaulting to an unsized (and thus maximally limited) type. i1 seems like an odd choice.

Address review comments.

nikic accepted this revision.Dec 10 2022, 12:54 AM

LGTM

llvm/docs/LangRef.rst
3624

Possibly this should clarify that not all target extension types can have zeroinitializer.

llvm/test/Assembler/target-type-properties.ll
2 ↗(On Diff #481733)

This is pretty weird, use split-file instead?

This revision is now accepted and ready to land.Dec 10 2022, 12:54 AM

Documentation and test tweaks.

arsenm added a subscriber: arsenm.Dec 14 2022, 11:32 AM
arsenm added inline comments.
llvm/test/Assembler/target-type-properties.ll
13 ↗(On Diff #482940)

Is empty string valid?

the comments and patch description are inconsistent about "opaque" vs "target extension", cleaning this up would be good for avoiding confusion in the future

jcranmer-intel retitled this revision from [IR] Add an opaque type to LLVM. to [IR] Add a target extension type to LLVM..Dec 16 2022, 7:25 AM
jcranmer-intel edited the summary of this revision. (Show Details)

the comments and patch description are inconsistent about "opaque" vs "target extension", cleaning this up would be good for avoiding confusion in the future

I updated the git commit message locally, but it seems that Arcanist wasn't taking the changes into account here.

llvm/test/Assembler/target-type-properties.ll
13 ↗(On Diff #482940)

Yes, it should be.

Rebase to trunk, and add release notes.

Some pedantic nits.
"opaque" remains in some places.

llvm/docs/ReleaseNotes.rst
108 ↗(On Diff #483996)

"target-specific types whose types" is confusing.
Do I read it wrong or did you mean something different? In any case this should be clarified.

llvm/include/llvm-c/Core.h
289

Nit: add comma

llvm/include/llvm/IR/Constants.h
868

Above comments are missing '.' in the end and this one ends with a colon.

llvm/lib/IR/Type.cpp
837

The struct should be in anonymous namespace.

Address review comments (and clang-format issues).

barannikov88 accepted this revision.Dec 19 2022, 1:55 PM
barannikov88 added inline comments.
llvm/docs/ReleaseNotes.rst
109 ↗(On Diff #484057)

I hesitate to make comments about someone's English because my English is not so good.
The sentence still does not sound right to me, this part: "types ... are not introspectable by ... types"
Please ignore if everything is OK.

Fix typo in release notes.

llvm/docs/ReleaseNotes.rst
109 ↗(On Diff #484057)

The word was meant to be "optimizations", not "types". The casualties of writing too quickly. :-)

This revision was landed with ongoing or failed builds.Dec 20 2022, 8:04 AM
This revision was automatically updated to reflect the committed changes.

@jcranmer-intel
A new build warning:
llvm/lib/Target/Hexagon/HexagonTargetObjectFile.cpp:296:11: warning: enumeration value 'TargetExtTyID' not handled in switch [-Wswitch]

@jcranmer-intel
A new build warning:
llvm/lib/Target/Hexagon/HexagonTargetObjectFile.cpp:296:11: warning: enumeration value 'TargetExtTyID' not handled in switch [-Wswitch]

Fixed via https://reviews.llvm.org/rGaa34a9d108bd79eb0ac40979a7bdae4b19b1624d

zixuan-wu added inline comments.
llvm/lib/IR/Type.cpp
839

Can the LayoutType be struct type, or scalable vector type?

852

Is there anymore better code structure to separate target-specific code to get TargetTypeInfo such as individual hook?

hi, @jcranmer-intel , could you please propose a backend demo revision to demonstrate how to use targetext type of backend?

hctim added a subscriber: hctim.Jan 4 2023, 10:04 AM

Hey, looks like this patch may have introduced a memory leak, as reported below by ASan.

This is caught on my local machine, which doesn't use an asan-instrumented libcxx (like our bots do). This does result in some false negatives, so an instrumented libcxx is always a better option, but this is a nice fast build for me to reproduce failures and bisect faster.

The stack trace points to a std::string heap leak, and so my hunch was that the bots didn't pick up this failure due to the different point at which libcxx and libstdc++ stop inlining the string and push it off to the heap. That turned out to be correct, changing the test like so even makes it reproduce using an instrumented libcxx (and thus the bots upstream):

diff --git a/llvm/test/Assembler/target-types.ll b/llvm/test/Assembler/target-types.ll
index 741a1a159678..7497fa8806df 100644
--- a/llvm/test/Assembler/target-types.ll
+++ b/llvm/test/Assembler/target-types.ll
@@ -1,7 +1,7 @@
 ; RUN: llvm-as < %s | llvm-dis | FileCheck %s
 ; Check support for basic target extension type usage

-@global = global target("spirv.DeviceEvent") zeroinitializer
+@global = global target("spirv.DeviceEventXXXXXX") zeroinitializer

 define target("spirv.Sampler") @foo(target("spirv.Sampler") %a) {
   ret target("spirv.Sampler") %a
@@ -13,7 +13,7 @@ define target("spirv.Event") @func2() {
   ret target("spirv.Event") poison
 }

-; CHECK: @global = global target("spirv.DeviceEvent") zeroinitializer
+; CHECK: @global = global target("spirv.DeviceEventXXXXXX") zeroinitializer
 ; CHECK: define target("spirv.Sampler") @foo(target("spirv.Sampler") %a) {
 ; CHECK:   ret target("spirv.Sampler") %a
 ; CHECK: }

My invocation for the "fast asan build" and the breaking unit test which uses libstdc++ is below. This fails at e6b02214c68df2c9f826e02310c9352ac652e456 (this commit) but passes the commit before.

$ cmake \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DLLVM_USE_LINKER=lld \
-GNinja \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_FLAGS="-fsanitize=address" \
-DCMAKE_CXX_FLAGS="-fsanitize=address" \
-DLLVM_ENABLE_PROJECTS="'clang;lld;clang-tools-extra;mlir'" \
-DLLVM_LIBC_ENABLE_LINTING=OFF \
-DLLVM_USE_SANITIZER=Address \
/llvm
$ LIT_OPTS='--filter=Assembler/target-types.ll' ninja check-llvm
Direct leak of 18 byte(s) in 1 object(s) allocated from:
    #0 0x557fdbae956d in operator new(unsigned long) /compiler-rt/lib/asan/asan_new_delete.cpp:95:3
    #1 0x557fdbeb4f33 in _M_construct<const char *> /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/basic_string.tcc:225:14
    #2 0x557fdbeb4f33 in basic_string /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/basic_string.h:620:2
    #3 0x557fdbeb4f33 in basic_string /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/basic_string.h:188:9
    #4 0x557fdbeb4f33 in basic_string<llvm::StringRef, void> /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/basic_string.h:787:4
    #5 0x557fdbeb4f33 in llvm::TargetExtType::TargetExtType(llvm::LLVMContext&, llvm::StringRef, llvm::ArrayRef<llvm::Type*>, llvm::ArrayRef<unsigned int>) /llvm/lib/IR/Type.cpp:796:31
    #6 0x557fdbeb6104 in llvm::TargetExtType::get(llvm::LLVMContext&, llvm::StringRef, llvm::ArrayRef<llvm::Type*>, llvm::ArrayRef<unsigned int>) /llvm/lib/IR/Type.cpp:830:14
    #7 0x557fdbb8ce03 in parseTypeTableBody /llvm/lib/Bitcode/Reader/BitcodeReader.cpp:2514:18
    #8 0x557fdbb8ce03 in parseTypeTable /llvm/lib/Bitcode/Reader/BitcodeReader.cpp:2252:10
    #9 0x557fdbb8ce03 in (anonymous namespace)::BitcodeReader::parseModule(unsigned long, bool, llvm::function_ref<std::optional<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> (llvm::StringRef)>) /llvm/lib/Bitcode/Reader/BitcodeReader.cpp:4245:25
    #10 0x557fdbb056e3 in parseBitcodeInto /llvm/lib/Bitcode/Reader/BitcodeReader.cpp:4476:10
    #11 0x557fdbb056e3 in llvm::BitcodeModule::getModuleImpl(llvm::LLVMContext&, bool, bool, bool, llvm::function_ref<std::optional<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> (llvm::StringRef)>) /llvm/lib/Bitcode/Reader/BitcodeReader.cpp:7927:22
    #12 0x557fdbb09212 in llvm::BitcodeModule::getLazyModule(llvm::LLVMContext&, bool, bool) /llvm/lib/Bitcode/Reader/BitcodeReader.cpp:7946:10
    #13 0x557fdbaf1144 in main /llvm/tools/llvm-dis/llvm-dis.cpp:205:16
    #14 0x7f137e229209 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

SUMMARY: AddressSanitizer: 18 byte(s) leaked in 1 allocation(s).

hi, @jcranmer-intel , could you please propose a backend demo revision to demonstrate how to use targetext type of backend?

I don't have an example backend that goes through GlobalISel or SelectionDAG yet; I've been focusing my effort on the production SPIR-V backend which translates to SPIR-V straight from LLVM IR.

There may need to be some additional work to map these types to appropriate MVT types in the codegen phases, but my knowledge of codegen infrastructure is somewhat shaky, and this is highly dependent on how different backends wish to translate these types.

Hey, looks like this patch may have introduced a memory leak, as reported below by ASan.

That is a good catch. I have a fix for this, I think, in https://reviews.llvm.org/D141083 -- can you test it out?

llvm/lib/IR/Type.cpp
839

Yes.

852

In principle, it is possible to provide target hooks that could let individual targets define their own TargetTypeInfo tables. However, there is a bit of a layering issue: the information needed is too verbose to stuff into a datalayout string, and the existing target hooks aren't in a place where they can make it to DataLayout (who needs to know the size) or Verifier (for the other properties). Something new would have to be added, and it's not clear that it provides significant advantages over a hardcoded list for now.

zixuan-wu added inline comments.Mon, Jan 9, 10:59 PM
llvm/lib/IR/Type.cpp
852

It's OK for me to evolve or iterate the hardcoded problem later.

zixuan-wu added a comment.EditedMon, Jan 9, 11:07 PM

I don't have an example backend that goes through GlobalISel or SelectionDAG yet; I've been focusing my effort on the production SPIR-V backend which translates to SPIR-V straight from LLVM IR.

There may need to be some additional work to map these types to appropriate MVT types in the codegen phases, but my knowledge of codegen infrastructure is somewhat shaky, and this is highly dependent on how different backends wish to translate these types.

I'd like to enable the backend infra and propose draft later. To better demonstrate and enable backend target-independent code completely, I'd like to demonstrate with x86_amx that changing x86_amx type to target extension type and make all test passed, if nobody objects. @jcranmer-intel @LuoYuanke @craig.topper @pengfei

zixuan-wu added a subscriber: pengfei.
nikic added a comment.Tue, Jan 10, 8:37 AM

I'd like to enable the backend infra and propose draft later. To better demonstrate and enable backend target-independent code completely, I'd like to demonstrate with x86_amx that changing x86_amx type to target extension type and make all test passed, if nobody objects.

Moving x86_amx to use a target-extension type would be great.

I'd like to enable the backend infra and propose draft later. To better demonstrate and enable backend target-independent code completely, I'd like to demonstrate with x86_amx that changing x86_amx type to target extension type and make all test passed, if nobody objects. @jcranmer-intel @LuoYuanke @craig.topper @pengfei

I have no objections to this; indeed, my original proposal for target extension types included x86_mmx and x86_amx as types that could be implemented as target extension types. However, I'm not involved with the x86 backend, so I must defer to the proper codeowners there as to whether or not to do this.

I'd like to enable the backend infra and propose draft later. To better demonstrate and enable backend target-independent code completely, I'd like to demonstrate with x86_amx that changing x86_amx type to target extension type and make all test passed, if nobody objects. @jcranmer-intel @LuoYuanke @craig.topper @pengfei

I have no objections to this; indeed, my original proposal for target extension types included x86_mmx and x86_amx as types that could be implemented as target extension types. However, I'm not involved with the x86 backend, so I must defer to the proper codeowners there as to whether or not to do this.

I think it's more appropriate to start with x86_fp80 and x86_mmx. I'd expect there will be a more general matrix representation for the AMX type, e.g., TLX, <8x8xi8> etc. rather than limit it to target extension type, though I'm not against to do prototype for it.

I'd like to enable the backend infra and propose draft later. To better demonstrate and enable backend target-independent code completely, I'd like to demonstrate with x86_amx that changing x86_amx type to target extension type and make all test passed, if nobody objects. @jcranmer-intel @LuoYuanke @craig.topper @pengfei

I have no objections to this; indeed, my original proposal for target extension types included x86_mmx and x86_amx as types that could be implemented as target extension types. However, I'm not involved with the x86 backend, so I must defer to the proper codeowners there as to whether or not to do this.

I think it's more appropriate to start with x86_fp80 and x86_mmx. I'd expect there will be a more general matrix representation for the AMX type, e.g., TLX, <8x8xi8> etc. rather than limit it to target extension type, though I'm not against to do prototype for it.

Aren't both x86_fp80 and x86_mmx recognized by InstCombine? They aren't truly opaque.

I'd like to enable the backend infra and propose draft later. To better demonstrate and enable backend target-independent code completely, I'd like to demonstrate with x86_amx that changing x86_amx type to target extension type and make all test passed, if nobody objects. @jcranmer-intel @LuoYuanke @craig.topper @pengfei

I have no objections to this; indeed, my original proposal for target extension types included x86_mmx and x86_amx as types that could be implemented as target extension types. However, I'm not involved with the x86 backend, so I must defer to the proper codeowners there as to whether or not to do this.

I think it's more appropriate to start with x86_fp80 and x86_mmx. I'd expect there will be a more general matrix representation for the AMX type, e.g., TLX, <8x8xi8> etc. rather than limit it to target extension type, though I'm not against to do prototype for it.

Aren't both x86_fp80 and x86_mmx recognized by InstCombine? They aren't truly opaque.

x86_amx is also recognized by InstCombine, but may not as much as x86_mmx.

zixuan-wu added a subscriber: lattner.EditedWed, Jan 11, 8:15 PM

I'd like to enable the backend infra and propose draft later. To better demonstrate and enable backend target-independent code completely, I'd like to demonstrate with x86_amx that changing x86_amx type to target extension type and make all test passed, if nobody objects. @jcranmer-intel @LuoYuanke @craig.topper @pengfei

I have no objections to this; indeed, my original proposal for target extension types included x86_mmx and x86_amx as types that could be implemented as target extension types. However, I'm not involved with the x86 backend, so I must defer to the proper codeowners there as to whether or not to do this.

I think it's more appropriate to start with x86_fp80 and x86_mmx. I'd expect there will be a more general matrix representation for the AMX type, e.g., TLX, <8x8xi8> etc. rather than limit it to target extension type, though I'm not against to do prototype for it.

Well, as chris's @lattner comment of TLX which I also prefer, it's not a good way to support high level type in LLVM IR such as Matrix(or multi-dimension vector). We can convert matrix to fixed vector type and leverage with those intrinsics matrix-intrinsics, if we need fixed sized matrix. I am not sure whether it's sustainable and workable to completely develop AMX feature. For example, need we distinguish vector type and matrix when we addRegisterClass? And another point is do we really need different shape matrix in LLVM IR or just one type called x86amx?

I'd like to enable the backend infra and propose draft later. To better demonstrate and enable backend target-independent code completely, I'd like to demonstrate with x86_amx that changing x86_amx type to target extension type and make all test passed, if nobody objects. @jcranmer-intel @LuoYuanke @craig.topper @pengfei

I have no objections to this; indeed, my original proposal for target extension types included x86_mmx and x86_amx as types that could be implemented as target extension types. However, I'm not involved with the x86 backend, so I must defer to the proper codeowners there as to whether or not to do this.

I think it's more appropriate to start with x86_fp80 and x86_mmx. I'd expect there will be a more general matrix representation for the AMX type, e.g., TLX, <8x8xi8> etc. rather than limit it to target extension type, though I'm not against to do prototype for it.

Aren't both x86_fp80 and x86_mmx recognized by InstCombine? They aren't truly opaque.

Now I am not sure whether it still needs to respect the mmx or amx type in InstCombine. Maybe it can be removed, or respect the target extension type directly instead.