Page MenuHomePhabricator

[PoC][IR] Permit load/store/alloca for struct with the same scalable vectors.
Needs ReviewPublic

Authored by HsiangKai on Mar 8 2021, 4:23 AM.

Details

Summary

In the current StructLayout implementation, it uses uint64_t to represent the size of struct and offsets of struct members. We use TypeSize for the size of struct and the offsets of elements. In this way, we could record the correct information in the StructLayout when it contains scalable elements. However, TypeSize is a one-dimension polynomial type. To minimize the impact to the current implementation and to fit our requirements, we only permit load/store/alloca all scalable types in a struct or all fixed length types in a struct. That is, TypeSize is either scalable size or fixed size.

Diff Detail

Event Timeline

HsiangKai created this revision.Mar 8 2021, 4:23 AM
HsiangKai requested review of this revision.Mar 8 2021, 4:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2021, 4:23 AM
david-arm added inline comments.Mar 9 2021, 2:47 AM
llvm/include/llvm/IR/DataLayout.h
631

This should also return a TypeSize to be consistent with getSizeInBits.

llvm/include/llvm/Support/TypeSize.h
148 ↗(On Diff #328971)

I'm not sure we want to use operators here - see the TypeSize class for how we did this using isKnownXY functions. There was a long history of discussion about the use of operators, which you can see on some of the earlier patches. The problem with operators is that there are cases where you simply do not know the answer. For example, there is no compile-time answer for Fixed(8) < Scalable(4) since we don't know the value of vscale.

HsiangKai updated this revision to Diff 329543.Mar 9 2021, 10:56 PM

Add comments before getSizeInBytes() and remove compare operators in the StackOffset.

HsiangKai added inline comments.Mar 9 2021, 10:59 PM
llvm/include/llvm/IR/DataLayout.h
631

How about keep the interface as so.

In current implementation, the elements will be scalar or scalable. The uses of StructLayout::getSizeInBytes() will be compared with StructLayout::getElementOffset() usually. Keep the interface to return uint64_t should be fine.

I added a FIXME comment before the function.

david-arm added inline comments.Mar 10 2021, 12:08 AM
llvm/include/llvm/IR/DataLayout.h
631

But isn't the problem here that the result is now wrong? We're now returning the minimum value, rather than a TypeSize that tries to express the complete value. Also, it seems quite confusing that returning a size in bits gives you TypeSize and returning a size in bytes gives you uint64_t. This is inconsistent with how sizes are returned elsewhere, for example see:

include/llvm/CodeGen/ValueTypes.h

where we have

TypeSize getSizeInBits() const
TypeSize getStoreSize() const

where the latter returns bytes. If a function must return the minimum value then I think at least the function name should reflect that with a rename, i.e. getMinSizeInBytes() so that the caller understands this is not the actual size.

I'm curious - the restriction that it must be all scalable or all fixed is because we're using TypeSize with only one dimension? It sounds like we'd ideally want to be able to represent the size of structs with something more like StackOffset with two dimensions?

Hi @HsiangKai, this is not a supported use-case for scalable vectors and it is explicitly called out in the LangRef:

Scalable vectors cannot be global variables or members of arrays because their size is unknown at compile time. They are allowed in structs to facilitate intrinsics returning multiple values. Structs containing scalable vectors cannot be used in loads, stores, allocas, or GEPs.

There have been a few conversations about this before. I previously left a comment on D94142 and there was also some discussion on @craig.topper's RFC to add limited support for intrinsics, which can be found here: https://lists.llvm.org/pipermail/llvm-dev/2021-January/147639.html

Supporting scalable vector types as exclusive members of structs that can be used in loads/stores/allocas is a bit of a sliding scale, because the next question will be the one @frasercrmck is asking: "Why can't you mix fixed/scalable types?".

It's been still relatively recent that we've settled on a definition for TypeSize which still needs to be supported in many places in the code-base. Mixing fixed/scalable adds another dimension of complexity, because then all interfaces that take an offset also need to consider the fact that it may be comprised of a fixed-width and scalable-width component. Also, at the moment there isn't really a use-case, there are no languages that allow or support structs/aggregates with scalable members. The original advice given for "What if C/C++ allowed such a use case?" was to solve this in Clang, so as to remove the need for such support in LLVM IR.

The only reason we wanted to allow the use-case for return-values/operands of intrinsics is so that we can model operations that return multiple values which can be scalable, such as first-faulting loads where it returns a data vector (the loaded data) and a predicate mask for the lanes that faulted.

@HsiangKai was there a specific use-case that you had in mind for this?

Hi @HsiangKai, this is not a supported use-case for scalable vectors and it is explicitly called out in the LangRef:

Scalable vectors cannot be global variables or members of arrays because their size is unknown at compile time. They are allowed in structs to facilitate intrinsics returning multiple values. Structs containing scalable vectors cannot be used in loads, stores, allocas, or GEPs.

There have been a few conversations about this before. I previously left a comment on D94142 and there was also some discussion on @craig.topper's RFC to add limited support for intrinsics, which can be found here: https://lists.llvm.org/pipermail/llvm-dev/2021-January/147639.html

Supporting scalable vector types as exclusive members of structs that can be used in loads/stores/allocas is a bit of a sliding scale, because the next question will be the one @frasercrmck is asking: "Why can't you mix fixed/scalable types?".

It's been still relatively recent that we've settled on a definition for TypeSize which still needs to be supported in many places in the code-base. Mixing fixed/scalable adds another dimension of complexity, because then all interfaces that take an offset also need to consider the fact that it may be comprised of a fixed-width and scalable-width component. Also, at the moment there isn't really a use-case, there are no languages that allow or support structs/aggregates with scalable members. The original advice given for "What if C/C++ allowed such a use case?" was to solve this in Clang, so as to remove the need for such support in LLVM IR.

The only reason we wanted to allow the use-case for return-values/operands of intrinsics is so that we can model operations that return multiple values which can be scalable, such as first-faulting loads where it returns a data vector (the loaded data) and a predicate mask for the lanes that faulted.

@HsiangKai was there a specific use-case that you had in mind for this?

We want this to support the segment load/store intrinsics defined here https://github.com/riscv/rvv-intrinsic-doc/blob/master/intrinsic_funcs/03_vector_load_store_segment_instructions_zvlsseg.md These return 2 to 8 vectors that have been loaded into consecutive registers. I believe SVE has similar instructions. I believe SVE represents these using types wider than their normal scalable vector types and relies on the type legalizer to split them up in the backend. This works for SVE because there is only one known minimum size for all scalable vector types so the type legalizer will always split down to that minimum type.

For RISC-V vectors we already use 7 different sizes of scalable vectors to represent the ability of our instructions to operate on 2, 4, or 8 registers simultaneously. And for 1/2, 1/4, and 1/8 fractional registers. The segment load/store instructions add an extra dimension where they can produce/consume 2, 3, or 4 pairs of registers or 2 quadruples, for examples. Following the SVE strategy would give us ambiguous types for the type legalizer.

To solve this we would like to use a struct for the segment load/stores to separate them in IR. Since clang needs an address for every variable and needs to be able to load/store them we need to support load/store/alloca.

We want this to support the segment load/store intrinsics defined here https://github.com/riscv/rvv-intrinsic-doc/blob/master/intrinsic_funcs/03_vector_load_store_segment_instructions_zvlsseg.md These return 2 to 8 vectors that have been loaded into consecutive registers. I believe SVE has similar instructions. I believe SVE represents these using types wider than their normal scalable vector types and relies on the type legalizer to split them up in the backend. This works for SVE because there is only one known minimum size for all scalable vector types so the type legalizer will always split down to that minimum type.

Thanks for providing the context!

For RISC-V vectors we already use 7 different sizes of scalable vectors to represent the ability of our instructions to operate on 2, 4, or 8 registers simultaneously. And for 1/2, 1/4, and 1/8 fractional registers. The segment load/store instructions add an extra dimension where they can produce/consume 2, 3, or 4 pairs of registers or 2 quadruples, for examples. Following the SVE strategy would give us ambiguous types for the type legalizer.

How does that look in terms of IR? Is the number of registers somehow represented in the (LLVM IR) vector type? Or are the types the same, but the compiler generates different code depending on what mode is set? For SVE we know we can split the vector because <vscale x 8 x i32> is twice the size of <vscale x 4 x i32>, regardless of the value for vscale. Indeed we know SVE vectors area multiple of 128bits, and therefore that <vscale x 4 x i32> is legal. In order to make any assumptions about splitting/legalization, the compiler will need to know which types are legal, and so would expect the compiler to know the mode (2, 4 ,8) for RVV when generating the code, and therefore have similar knowledge about which types are legal and how the vectors are represented/split into registers. How does that lead to ambiguous types?

To solve this we would like to use a struct for the segment load/stores to separate them in IR. Since clang needs an address for every variable and needs to be able to load/store them we need to support load/store/alloca.

These (C/C++-level) intrinsics are probably implemented using target-specific intrinsics or perhaps a common LLVM IR intrinsic like masked.load, which should be able to take/return a struct with scalable members after D94142. If so, it should be possible to handle this in Clang by emitting extractvalue instructions and storing each member individually. That would avoid any changes to LLVM IR. Is that something you've considered?

If we do need to make this work for scalable vectors, I think it needs a message to the mailing list because it's a change to the LangRef and capabilities of scalable vectors, given previous discussions on this topic. I'd like to avoid giving the impression that we're quietly moving the goalpost on what scalable vectors can do in IR.

HsiangKai retitled this revision from [IR] Permit load/store/alloca for struct with the same scalable vectors. to [PoC][IR] Permit load/store/alloca for struct with the same scalable vectors..Mar 13 2021, 5:17 AM

This patch is a proof of concept patch. We should have a formal discussion about the idea before reviewing this patch.

Some related patches:
We model segment load/store types in D97264.
We intend to separate the uses of scalable types and scalable vector types in D98161.

HsiangKai updated this revision to Diff 331747.Mar 18 2021, 8:07 PM
  • Change the return type of getSizeInBytes() to TypeSize.
  • Mark the places we need to take care.

Use TypeSize for offsets instead of StackOffset.

The memebers are all scalable or all fixed objects. We could use TypeSize for offsets. In other places in the current implementation, it uses TypeSize as the offset type. Use TypeSize for member offsets is more consistent with the current implementation.

HsiangKai updated this revision to Diff 332911.Wed, Mar 24, 2:46 AM

Fix test failed.

HsiangKai updated this revision to Diff 333660.Sat, Mar 27, 1:35 AM

Consider scalable struct types in passes processing StructType and remove TODO comments.

craig.topper added inline comments.Mon, Mar 29, 1:27 PM
llvm/include/llvm/IR/DataLayout.h
628–629

The is special code in DataLayout::getStructLayout that allocates the size of StructLayout to account for the needed size of MemberOffsets. That code does not appear to be updated in this patch.

I'm not sure we want to use SmallVector here. The minimum size is increasing the size of every StructLayout object. SmallVector contains the number of elements which is redundant with the NumElements in the StructLayout itself. SmallVector also contains an additional field for the capacity of the allocated memory.

Can we go back to the variable sized array and update DataLayout::getStructLayout to use TypeSize instead of uint64_t to calculate the needed space?

llvm/lib/Analysis/MemoryBuiltins.cpp
354

Was it already a bug that scalable vectors weren't checked for here?

llvm/lib/Analysis/ScalarEvolution.cpp
3447

"could not" -> "cannot"

llvm/lib/CodeGen/Analysis.cpp
92

Can we remove the conditional here and always get the the struct layout? Then we don't need to check for null StructLayout in the loop. This was one of the change we did previously when we allowed scalable vectors in structs for intrinsic returns.

112

Is the operand reordering required? I would hope that TypeSize multiplied by unsigned can be in either order.

llvm/lib/CodeGen/GlobalISel/CallLowering.cpp
773

Does this need to be addressed as part of this review?

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
1891

Maybe we need a new entrypoint to ComputeValueVTs that does this automatically based on the type passed to argument 2?

llvm/lib/IR/Type.cpp
187

Please address this lint warnig

188

I'm slightly concerned that containsScalableVectorType() is recursive and we're now calling isScalableType in a bunch of places. This could be bad for deeply nested structs.

On that subject, are we allowing or preventing struct of scalable vectors to be part of other structs?

543

One -> Element.

HsiangKai updated this revision to Diff 333993.Mon, Mar 29, 3:06 PM
  • Add an assertion to check if it is a scalable struct type before creating GEP instructions.
  • Address comments.
HsiangKai marked 7 inline comments as done.Mon, Mar 29, 3:09 PM
HsiangKai updated this revision to Diff 334029.Mon, Mar 29, 6:22 PM

Fix build fail.

craig.topper added inline comments.Mon, Mar 29, 7:29 PM
llvm/lib/CodeGen/Analysis.cpp
92

Nevermind. We need this to support a mix of scalable and scalars for intrinsic returns.

HsiangKai updated this revision to Diff 334143.Tue, Mar 30, 6:50 AM
HsiangKai edited the summary of this revision. (Show Details)

Use TrailingObjects for MemberOffsets.

@HsiangKai can you rebase this on trunk. Some of the TrailingObjects changes would go away.