This is an archive of the discontinued LLVM Phabricator instance.

[IR] Permit load/store/alloca for struct with the same scalable vectors.
AbandonedPublic

Authored by HsiangKai on Mar 8 2021, 4:23 AM.

Details

Summary

In this patch, we try to support load/store/alloca for scalable structures. We have posted a RFC for the proposal in the mailing list. https://groups.google.com/g/llvm-dev/c/6ZK2eS4-8t0/m/PG6H1NNDBAAJ

In RISC-V vector intrinsics, we have defined types containing multiple scalable vectors. Due to the flexible configuration of vector types with LMUL, the struct is a good fit to represent these types containing multiple scalable vectors. We permit users to define auto variables using these types. That is why we need to support load, store, and alloca for the scalable structure.

To support load, store and alloca for the scalable structure, we have to achieve the following two points.

  • In StructLayout, we use uint64_t for the StructSize. Currently, we have TypeSize to represent the size of data. We should modify the StructLayout using the TypeSize for data size regardless to support scalable struct or not.
  • In order to handle load and store scalable struct correctly, we need to modify ComputeValueVTs() to return the correct information. The returned Offsets should be represented as TypeSize. In order to collect the correct information, we need to represent the StructSize in StructLayout using TypeSize. (refer to the first point.)

We have some limitation for the scalable structure.

  • Due to the TypeSize could only represent either scalable size or fixed size, we could not mix the fixed objects and scalable objects in a struct. We only support struct with all scalable objects.
  • We only need load/store/alloca scalable struct. We have no need to support GEP for scalable struct. It could alleviate the effort to expand the capabilities of IR.

Diff Detail

Event Timeline

HsiangKai created this revision.Mar 8 2021, 4:23 AM
HsiangKai requested review of this revision.Mar 8 2021, 4:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2021, 4:23 AM
david-arm added inline comments.Mar 9 2021, 2:47 AM
llvm/include/llvm/IR/DataLayout.h
631

This should also return a TypeSize to be consistent with getSizeInBits.

llvm/include/llvm/Support/TypeSize.h
148 ↗(On Diff #328971)

I'm not sure we want to use operators here - see the TypeSize class for how we did this using isKnownXY functions. There was a long history of discussion about the use of operators, which you can see on some of the earlier patches. The problem with operators is that there are cases where you simply do not know the answer. For example, there is no compile-time answer for Fixed(8) < Scalable(4) since we don't know the value of vscale.

HsiangKai updated this revision to Diff 329543.Mar 9 2021, 10:56 PM

Add comments before getSizeInBytes() and remove compare operators in the StackOffset.

HsiangKai added inline comments.Mar 9 2021, 10:59 PM
llvm/include/llvm/IR/DataLayout.h
631

How about keep the interface as so.

In current implementation, the elements will be scalar or scalable. The uses of StructLayout::getSizeInBytes() will be compared with StructLayout::getElementOffset() usually. Keep the interface to return uint64_t should be fine.

I added a FIXME comment before the function.

david-arm added inline comments.Mar 10 2021, 12:08 AM
llvm/include/llvm/IR/DataLayout.h
631

But isn't the problem here that the result is now wrong? We're now returning the minimum value, rather than a TypeSize that tries to express the complete value. Also, it seems quite confusing that returning a size in bits gives you TypeSize and returning a size in bytes gives you uint64_t. This is inconsistent with how sizes are returned elsewhere, for example see:

include/llvm/CodeGen/ValueTypes.h

where we have

TypeSize getSizeInBits() const
TypeSize getStoreSize() const

where the latter returns bytes. If a function must return the minimum value then I think at least the function name should reflect that with a rename, i.e. getMinSizeInBytes() so that the caller understands this is not the actual size.

I'm curious - the restriction that it must be all scalable or all fixed is because we're using TypeSize with only one dimension? It sounds like we'd ideally want to be able to represent the size of structs with something more like StackOffset with two dimensions?

Hi @HsiangKai, this is not a supported use-case for scalable vectors and it is explicitly called out in the LangRef:

Scalable vectors cannot be global variables or members of arrays because their size is unknown at compile time. They are allowed in structs to facilitate intrinsics returning multiple values. Structs containing scalable vectors cannot be used in loads, stores, allocas, or GEPs.

There have been a few conversations about this before. I previously left a comment on D94142 and there was also some discussion on @craig.topper's RFC to add limited support for intrinsics, which can be found here: https://lists.llvm.org/pipermail/llvm-dev/2021-January/147639.html

Supporting scalable vector types as exclusive members of structs that can be used in loads/stores/allocas is a bit of a sliding scale, because the next question will be the one @frasercrmck is asking: "Why can't you mix fixed/scalable types?".

It's been still relatively recent that we've settled on a definition for TypeSize which still needs to be supported in many places in the code-base. Mixing fixed/scalable adds another dimension of complexity, because then all interfaces that take an offset also need to consider the fact that it may be comprised of a fixed-width and scalable-width component. Also, at the moment there isn't really a use-case, there are no languages that allow or support structs/aggregates with scalable members. The original advice given for "What if C/C++ allowed such a use case?" was to solve this in Clang, so as to remove the need for such support in LLVM IR.

The only reason we wanted to allow the use-case for return-values/operands of intrinsics is so that we can model operations that return multiple values which can be scalable, such as first-faulting loads where it returns a data vector (the loaded data) and a predicate mask for the lanes that faulted.

@HsiangKai was there a specific use-case that you had in mind for this?

Hi @HsiangKai, this is not a supported use-case for scalable vectors and it is explicitly called out in the LangRef:

Scalable vectors cannot be global variables or members of arrays because their size is unknown at compile time. They are allowed in structs to facilitate intrinsics returning multiple values. Structs containing scalable vectors cannot be used in loads, stores, allocas, or GEPs.

There have been a few conversations about this before. I previously left a comment on D94142 and there was also some discussion on @craig.topper's RFC to add limited support for intrinsics, which can be found here: https://lists.llvm.org/pipermail/llvm-dev/2021-January/147639.html

Supporting scalable vector types as exclusive members of structs that can be used in loads/stores/allocas is a bit of a sliding scale, because the next question will be the one @frasercrmck is asking: "Why can't you mix fixed/scalable types?".

It's been still relatively recent that we've settled on a definition for TypeSize which still needs to be supported in many places in the code-base. Mixing fixed/scalable adds another dimension of complexity, because then all interfaces that take an offset also need to consider the fact that it may be comprised of a fixed-width and scalable-width component. Also, at the moment there isn't really a use-case, there are no languages that allow or support structs/aggregates with scalable members. The original advice given for "What if C/C++ allowed such a use case?" was to solve this in Clang, so as to remove the need for such support in LLVM IR.

The only reason we wanted to allow the use-case for return-values/operands of intrinsics is so that we can model operations that return multiple values which can be scalable, such as first-faulting loads where it returns a data vector (the loaded data) and a predicate mask for the lanes that faulted.

@HsiangKai was there a specific use-case that you had in mind for this?

We want this to support the segment load/store intrinsics defined here https://github.com/riscv/rvv-intrinsic-doc/blob/master/intrinsic_funcs/03_vector_load_store_segment_instructions_zvlsseg.md These return 2 to 8 vectors that have been loaded into consecutive registers. I believe SVE has similar instructions. I believe SVE represents these using types wider than their normal scalable vector types and relies on the type legalizer to split them up in the backend. This works for SVE because there is only one known minimum size for all scalable vector types so the type legalizer will always split down to that minimum type.

For RISC-V vectors we already use 7 different sizes of scalable vectors to represent the ability of our instructions to operate on 2, 4, or 8 registers simultaneously. And for 1/2, 1/4, and 1/8 fractional registers. The segment load/store instructions add an extra dimension where they can produce/consume 2, 3, or 4 pairs of registers or 2 quadruples, for examples. Following the SVE strategy would give us ambiguous types for the type legalizer.

To solve this we would like to use a struct for the segment load/stores to separate them in IR. Since clang needs an address for every variable and needs to be able to load/store them we need to support load/store/alloca.

We want this to support the segment load/store intrinsics defined here https://github.com/riscv/rvv-intrinsic-doc/blob/master/intrinsic_funcs/03_vector_load_store_segment_instructions_zvlsseg.md These return 2 to 8 vectors that have been loaded into consecutive registers. I believe SVE has similar instructions. I believe SVE represents these using types wider than their normal scalable vector types and relies on the type legalizer to split them up in the backend. This works for SVE because there is only one known minimum size for all scalable vector types so the type legalizer will always split down to that minimum type.

Thanks for providing the context!

For RISC-V vectors we already use 7 different sizes of scalable vectors to represent the ability of our instructions to operate on 2, 4, or 8 registers simultaneously. And for 1/2, 1/4, and 1/8 fractional registers. The segment load/store instructions add an extra dimension where they can produce/consume 2, 3, or 4 pairs of registers or 2 quadruples, for examples. Following the SVE strategy would give us ambiguous types for the type legalizer.

How does that look in terms of IR? Is the number of registers somehow represented in the (LLVM IR) vector type? Or are the types the same, but the compiler generates different code depending on what mode is set? For SVE we know we can split the vector because <vscale x 8 x i32> is twice the size of <vscale x 4 x i32>, regardless of the value for vscale. Indeed we know SVE vectors area multiple of 128bits, and therefore that <vscale x 4 x i32> is legal. In order to make any assumptions about splitting/legalization, the compiler will need to know which types are legal, and so would expect the compiler to know the mode (2, 4 ,8) for RVV when generating the code, and therefore have similar knowledge about which types are legal and how the vectors are represented/split into registers. How does that lead to ambiguous types?

To solve this we would like to use a struct for the segment load/stores to separate them in IR. Since clang needs an address for every variable and needs to be able to load/store them we need to support load/store/alloca.

These (C/C++-level) intrinsics are probably implemented using target-specific intrinsics or perhaps a common LLVM IR intrinsic like masked.load, which should be able to take/return a struct with scalable members after D94142. If so, it should be possible to handle this in Clang by emitting extractvalue instructions and storing each member individually. That would avoid any changes to LLVM IR. Is that something you've considered?

If we do need to make this work for scalable vectors, I think it needs a message to the mailing list because it's a change to the LangRef and capabilities of scalable vectors, given previous discussions on this topic. I'd like to avoid giving the impression that we're quietly moving the goalpost on what scalable vectors can do in IR.

HsiangKai retitled this revision from [IR] Permit load/store/alloca for struct with the same scalable vectors. to [PoC][IR] Permit load/store/alloca for struct with the same scalable vectors..Mar 13 2021, 5:17 AM

This patch is a proof of concept patch. We should have a formal discussion about the idea before reviewing this patch.

Some related patches:
We model segment load/store types in D97264.
We intend to separate the uses of scalable types and scalable vector types in D98161.

HsiangKai updated this revision to Diff 331747.Mar 18 2021, 8:07 PM
  • Change the return type of getSizeInBytes() to TypeSize.
  • Mark the places we need to take care.

Use TypeSize for offsets instead of StackOffset.

The memebers are all scalable or all fixed objects. We could use TypeSize for offsets. In other places in the current implementation, it uses TypeSize as the offset type. Use TypeSize for member offsets is more consistent with the current implementation.

HsiangKai updated this revision to Diff 332911.Mar 24 2021, 2:46 AM

Fix test failed.

HsiangKai updated this revision to Diff 333660.Mar 27 2021, 1:35 AM

Consider scalable struct types in passes processing StructType and remove TODO comments.

craig.topper added inline comments.Mar 29 2021, 1:27 PM
llvm/include/llvm/IR/DataLayout.h
628–629

The is special code in DataLayout::getStructLayout that allocates the size of StructLayout to account for the needed size of MemberOffsets. That code does not appear to be updated in this patch.

I'm not sure we want to use SmallVector here. The minimum size is increasing the size of every StructLayout object. SmallVector contains the number of elements which is redundant with the NumElements in the StructLayout itself. SmallVector also contains an additional field for the capacity of the allocated memory.

Can we go back to the variable sized array and update DataLayout::getStructLayout to use TypeSize instead of uint64_t to calculate the needed space?

llvm/lib/Analysis/MemoryBuiltins.cpp
354

Was it already a bug that scalable vectors weren't checked for here?

llvm/lib/Analysis/ScalarEvolution.cpp
3447

"could not" -> "cannot"

llvm/lib/CodeGen/Analysis.cpp
92

Can we remove the conditional here and always get the the struct layout? Then we don't need to check for null StructLayout in the loop. This was one of the change we did previously when we allowed scalable vectors in structs for intrinsic returns.

112

Is the operand reordering required? I would hope that TypeSize multiplied by unsigned can be in either order.

llvm/lib/CodeGen/GlobalISel/CallLowering.cpp
773

Does this need to be addressed as part of this review?

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
1891

Maybe we need a new entrypoint to ComputeValueVTs that does this automatically based on the type passed to argument 2?

llvm/lib/IR/Type.cpp
187

Please address this lint warnig

188

I'm slightly concerned that containsScalableVectorType() is recursive and we're now calling isScalableType in a bunch of places. This could be bad for deeply nested structs.

On that subject, are we allowing or preventing struct of scalable vectors to be part of other structs?

543

One -> Element.

HsiangKai updated this revision to Diff 333993.Mar 29 2021, 3:06 PM
  • Add an assertion to check if it is a scalable struct type before creating GEP instructions.
  • Address comments.
HsiangKai marked 7 inline comments as done.Mar 29 2021, 3:09 PM
HsiangKai updated this revision to Diff 334029.Mar 29 2021, 6:22 PM

Fix build fail.

craig.topper added inline comments.Mar 29 2021, 7:29 PM
llvm/lib/CodeGen/Analysis.cpp
92

Nevermind. We need this to support a mix of scalable and scalars for intrinsic returns.

HsiangKai updated this revision to Diff 334143.Mar 30 2021, 6:50 AM
HsiangKai edited the summary of this revision. (Show Details)

Use TrailingObjects for MemberOffsets.

@HsiangKai can you rebase this on trunk. Some of the TrailingObjects changes would go away.

We want this to support the segment load/store intrinsics defined here https://github.com/riscv/rvv-intrinsic-doc/blob/master/intrinsic_funcs/03_vector_load_store_segment_instructions_zvlsseg.md These return 2 to 8 vectors that have been loaded into consecutive registers. I believe SVE has similar instructions. I believe SVE represents these using types wider than their normal scalable vector types and relies on the type legalizer to split them up in the backend. This works for SVE because there is only one known minimum size for all scalable vector types so the type legalizer will always split down to that minimum type.

Thanks for providing the context!

For RISC-V vectors we already use 7 different sizes of scalable vectors to represent the ability of our instructions to operate on 2, 4, or 8 registers simultaneously. And for 1/2, 1/4, and 1/8 fractional registers. The segment load/store instructions add an extra dimension where they can produce/consume 2, 3, or 4 pairs of registers or 2 quadruples, for examples. Following the SVE strategy would give us ambiguous types for the type legalizer.

How does that look in terms of IR? Is the number of registers somehow represented in the (LLVM IR) vector type? Or are the types the same, but the compiler generates different code depending on what mode is set? For SVE we know we can split the vector because <vscale x 8 x i32> is twice the size of <vscale x 4 x i32>, regardless of the value for vscale. Indeed we know SVE vectors area multiple of 128bits, and therefore that <vscale x 4 x i32> is legal. In order to make any assumptions about splitting/legalization, the compiler will need to know which types are legal, and so would expect the compiler to know the mode (2, 4 ,8) for RVV when generating the code, and therefore have similar knowledge about which types are legal and how the vectors are represented/split into registers. How does that lead to ambiguous types?

To solve this we would like to use a struct for the segment load/stores to separate them in IR. Since clang needs an address for every variable and needs to be able to load/store them we need to support load/store/alloca.

These (C/C++-level) intrinsics are probably implemented using target-specific intrinsics or perhaps a common LLVM IR intrinsic like masked.load, which should be able to take/return a struct with scalable members after D94142. If so, it should be possible to handle this in Clang by emitting extractvalue instructions and storing each member individually. That would avoid any changes to LLVM IR. Is that something you've considered?

We have defined types containing multiple scalable vectors and we permit users to use these types to define auto variables. That is why we need load, store and alloca capabilities for scalable structure.

If we do need to make this work for scalable vectors, I think it needs a message to the mailing list because it's a change to the LangRef and capabilities of scalable vectors, given previous discussions on this topic. I'd like to avoid giving the impression that we're quietly moving the goalpost on what scalable vectors can do in IR.

I have posted a RFC for the proposal in the mailing list.
https://groups.google.com/g/llvm-dev/c/6ZK2eS4-8t0/m/PG6H1NNDBAAJ

HsiangKai retitled this revision from [PoC][IR] Permit load/store/alloca for struct with the same scalable vectors. to [IR] Permit load/store/alloca for struct with the same scalable vectors..Apr 20 2021, 8:53 AM
HsiangKai edited the summary of this revision. (Show Details)
HsiangKai abandoned this revision.Jun 2 2021, 7:45 AM