This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
docs/
2/13
ScalableVectorType.rst

Differential D53695

Scalable VectorType RFC
AbandonedPublic

Authored by huntergr on Oct 25 2018, 5:56 AM.

Download Raw Diff

Details

Reviewers

lattner
hfinkel
rkruppe
simoll
rengolin
chandlerc
echristo

Summary

This contains the RFC for the scalable vector type alone, without the intrinsics.

This document will not be committed to the codebase, this is only for review.

I will post additions/changes to the RFC here.

Diff Detail

Event Timeline

Updated based on discussions at the 2018 devmeeting

rogfer01 added a subscriber: rogfer01.Oct 25 2018, 8:35 AM

This is looking good to me, thanks!

A few comments inline.

Also, can you move this to docs/Proposals, together with the other RFCs, please?

docs/ScalableVectorType.rst
41	Precisely. This is an optimisation problem, not an IR representation one.
46	It'd be good to have a small paragraph on why not for both restrictions.
56	It may not be that simple. If the middle-end has assumed scalable vectors exist, then it has also added wrapping code for the predication, runtime check of induction evolution, and potentially have used intrinsics to represent concepts that cannot be done in plain IR. If that's the case, then dropping the scalable would make a lot of other things to fail, potentially not in the legalisation phase, but further down the line, which will be much harder to debug. I think a safer assumption is: if the target doesn't support scalable vectors, then scalable vectors are not legal and need to fail.
110	Perfect.

This seems like yet another step in the right direction. Of course I may be biased as I've already been happy with previous iterations.

docs/ScalableVectorType.rst
48	Handling those in the frontend instead sounds like a good tradeoff for how much this simplifies the IR and the size queries in particular, kudos!
56	I don't think having a primitive IR type be not functional at all on the majority of targets will be acceptable. I also don't understand what problems you are seeing: any scalable vector code the middle end generates has to be valid if the processor implementation the code runs on happens to use a multiple (vscale) of 1, right? Dropping `scalable` in legalization just means specializing for this case (vscale=1) at compile time. If target-specific intrinsics are used that can't sensibly be implemented on another target (e.g., SVE first-fault loads), then of course instruction selection should fail. Intrinsics like `vscale()` or `stepvector()` are trivial to implement on any fixed-width SIMD architecture though.
107	Since aggregates containing scalable vectors aren't part of this proposal any more, is one of these two integers always zero? If so, how about instead returning one integer and a flag indicating if it needs to be scaled by the runtime multile (similar to VectorType::ElementCount)?

In D53695#1277187, @rengolin wrote:

Also, can you move this to docs/Proposals, together with the other RFCs, please?

Ah, I didn't spot that. Will do. Does phabricator automatically copy across the inline comments?

docs/ScalableVectorType.rst
56	Our original approach assumed that scalable vectors being illegal on other targets would be sufficient, but the feedback I got at the dev meeting indicated otherwise -- hence this change. As Robin says, the IR must be valid for vscale being 1 (as well as any other value), and the proposed intrinsics can easily be lowered to something appropriate for fixed-length vectors.
107	Yes, you're right -- I was still thinking about cases where we could have mixed comparisons in future if we found a use case, but without aggregates we only need to distinguish between scalable/non-scalable quantities. This would also have the side effect of simplifying the comparison operations further, in that I'd only have to check that the flags were equal on both sides instead of checking for both sides being 0 for one term before returning the comparison of the other term (and then checking them again in the other order). Thanks.

rengolin added inline comments.Oct 31 2018, 6:48 AM

docs/ScalableVectorType.rst
56	My worry is about different support on the different units. Specifically, SVE supports scatter/gather, predication, etc. which NEON doesn't. So, for Arm, generating scalable code and lowering to NEON may not only be terrible for performance, but could end up exposing a host of illegal lowering scenarios. I'm not saying we should forbid them by definition, but we could make them hard fail now and add them, case by case, when we have analysed and found them to be benign. I'm also not strongly set on this either, I'm just worried about safety. If everyone is clearly convinced this is safe in all cases, then I drop my argument. :)

Throwing in my 2 cents on the legalization issue. Apart from that, LGTM.

simoll added inline comments.Oct 31 2018, 7:11 AM

docs/ScalableVectorType.rst
56	Siding with @rengolin here: let the backend crash if it does not support scalable types. Vectorizers should simply not generate scalable types if they are not supported by the target. Legalizing scalable to unscaled IR is hard to get right (the legality issues already came up). It's worse if you expect fast SIMD to come out of this.. basically you open up the cost modelling can all over, considering TTI/TLI.. just like in LV/VPlan. If an application for this legalization pops up down the road, this can be revisited (WebAssembly?).

rkruppe added inline comments.Oct 31 2018, 7:51 AM

docs/ScalableVectorType.rst
56	Nobody's expecting good code to come out of writing scalable vector code and compiling it for NEON or SSE or something, just as you can already generate atrocious code by e.g. generating 128 bit masked gathers & scatters while compiling for NEON (those intrinsics are expanded into a sequence of conditional scalar loads/stores if they're not legal). Still, in general it's nice to have all backends support most IR constructs functionally though inefficiently. Among other things, it helps ensure that the semantics are not tied to a specific target (so it can be implemented by other future backends) and allows running test programs to understand those semantics without needing a particular chip or emulator. This is aspirational rather than a hard rule, there is and will always be a large body of seemingly simple LLVM IR that asserts somewhere in CodeGen, but I see absolutely no reason to rule it out from the get-go in this case. If some specific intrinsic can't be legalized reasonably and correctly, then don't legalize it and ISel crash on that specific operation. But many things can be legalized correctly with little effort (and already are for fixed-width vectors).

Updated based on feedback:

Add reasons for restrictions on global/aggregates
Change size query struct to an integer + a boolean instead of two integers.

I didn't move it to the proposals directory, since I'm not sure if we would lose the inline comments that way.

huntergr marked 2 inline comments as done.Nov 2 2018, 4:36 PM

Update based on off-list feedback:

Added a section on allowed operations to clarify this is a first class type.
Remove initial proposal on the flag for not inheriting vlen; there's more pushback against allowing runtime multiple changes. I think this design could still be extended to support that in the future, but I'm afraid I'll have to leave that battle to the RVV team.
Minor wording changes.

In D53695#1286251, @huntergr wrote:

I didn't move it to the proposals directory, since I'm not sure if we would lose the inline comments that way.

That's ok. You can move after approval and before commit.

In D53695#1376902, @huntergr wrote:

Added a section on allowed operations to clarify this is a first class type.

Remove initial proposal on the flag for not inheriting vlen; there's more pushback against allowing runtime multiple changes. I think this design could still be extended to support that in the future, but I'm afraid I'll have to leave that battle to the RVV team.

Minor wording changes.

Those changes look good to me. I think we converged into a good solution.

I'll approve now, but please wait a day or two for the last comments form the other reviewers.

Thanks!

docs/ScalableVectorType.rst
56	I concede, given that this is just a proposal. It may well be the case that legalising scalable vectors to non-scalable ones will prove hard and we'll end up adding a number of `llvm_unreachable` cases to protect it. I don't foresee generating scalable vectors by default on any target anyway, so this will always be an edge case at best.

This revision is now accepted and ready to land.Jan 30 2019, 8:23 AM

simoll accepted this revision.Jan 31 2019, 6:49 AM

Very outdated.

Herald added a project: Restricted Project. · View Herald TranscriptJul 18 2023, 6:39 AM

Herald added subscribers: wangpc, mgabka, alextsao1999 and 2 others. · View Herald Transcript

Revision Contents

Path

Size

docs/

ScalableVectorType.rst

138 lines

Diff 184268

docs/ScalableVectorType.rst

This file was added.

				=============================================================
				Extending VectorType to support scalable vector architectures
				=============================================================

				To represent a vector of unknown length a boolean `Scalable` property has been
				added to the `VectorType` class, which indicates that the number of elements in
				the vector is a runtime-determined integer multiple of the `NumElements` field.
				Most code that deals with vectors doesn't need to know the exact length, but
				does need to know relative lengths -- e.g. get a vector with the same number of
				elements but a different element type, or with half or double the number of
				elements.

				In order to allow code to transparently support scalable vectors, we introduce
				an `ElementCount` class with two members:

				- `unsigned Min`: the minimum number of elements.
				- `bool Scalable`: is the element count an unknown multiple of `Min`?

				For non-scalable vectors (``Scalable=false``) the scale is considered to be
				equal to one and thus `Min` represents the exact number of elements in the
				vector.

				The intent for code working with vectors is to use convenience methods and avoid
				directly dealing with the number of elements. If needed, calling
				`getElementCount` on a vector type instead of `getVectorNumElements` can be used
				to obtain the (potentially scalable) number of elements. Overloaded division and
				multiplication operators allow an ElementCount instance to be used in much the
				same manner as an integer for most cases.

				This mixture of compile-time and runtime quantities allow us to reason about the
				relationship between different scalable vector types without knowing their
				exact length.

				The runtime multiple is not expected to change during program execution for SVE
				or the SX Aurora architecture; execution behaviour is undefined for SVE in that
				case, as the upper register contents (past 128b) are not architecturally defined
				after a change of vector length in privileged code.

				For the RISC-V V extension, it may change on a per-function basis, which
				precludes interprocedural optimizations. I leave working out the details of any
				future extensions to scalable vector types in this manner to those working on
				rengolinUnsubmitted Not Done Reply Inline Actions Precisely. This is an optimisation problem, not an IR representation one. rengolin: Precisely. This is an optimisation problem, not an IR representation one.
				RVV.

				Allowed Operations
				------------------

				rengolinUnsubmitted Done Reply Inline Actions It'd be good to have a small paragraph on why not for both restrictions. rengolin: It'd be good to have a small paragraph on why not for both restrictions.
				All operations allowed on first class types will work on scalable vector types,
				including loads, stores, and PHI operations.
				rkruppeUnsubmitted Not Done Reply Inline Actions Handling those in the frontend instead sounds like a good tradeoff for how much this simplifies the IR and the size queries in particular, kudos! rkruppe: Handling those in the frontend instead sounds like a good tradeoff for how much this simplifies…

				Scalable vector Values can be spilled to/filled from the stack, with the details
				on stack frame layout left to the target.

				Restrictions
				------------

				Global variables cannot be scalable vector types, because they need to be a
				rengolinUnsubmitted Not Done Reply Inline Actions It may not be that simple. If the middle-end has assumed scalable vectors exist, then it has also added wrapping code for the predication, runtime check of induction evolution, and potentially have used intrinsics to represent concepts that cannot be done in plain IR. If that's the case, then dropping the scalable would make a lot of other things to fail, potentially not in the legalisation phase, but further down the line, which will be much harder to debug. I think a safer assumption is: if the target doesn't support scalable vectors, then scalable vectors are not legal and need to fail. rengolin: It may not be that simple. If the middle-end has assumed scalable vectors exist, then it has…
				rkruppeUnsubmitted Not Done Reply Inline Actions I don't think having a primitive IR type be not functional at all on the majority of targets will be acceptable. I also don't understand what problems you are seeing: any scalable vector code the middle end generates has to be valid if the processor implementation the code runs on happens to use a multiple (vscale) of 1, right? Dropping `scalable` in legalization just means specializing for this case (vscale=1) at compile time. If target-specific intrinsics are used that can't sensibly be implemented on another target (e.g., SVE first-fault loads), then of course instruction selection should fail. Intrinsics like `vscale()` or `stepvector()` are trivial to implement on any fixed-width SIMD architecture though. rkruppe: I don't think having a primitive IR type be not functional at all on the majority of targets…
				huntergrAuthorUnsubmitted Not Done Reply Inline Actions Our original approach assumed that scalable vectors being illegal on other targets would be sufficient, but the feedback I got at the dev meeting indicated otherwise -- hence this change. As Robin says, the IR must be valid for vscale being 1 (as well as any other value), and the proposed intrinsics can easily be lowered to something appropriate for fixed-length vectors. huntergr: Our original approach assumed that scalable vectors being illegal on other targets would be…
				rengolinUnsubmitted Not Done Reply Inline Actions My worry is about different support on the different units. Specifically, SVE supports scatter/gather, predication, etc. which NEON doesn't. So, for Arm, generating scalable code and lowering to NEON may not only be terrible for performance, but could end up exposing a host of illegal lowering scenarios. I'm not saying we should forbid them by definition, but we could make them hard fail now and add them, case by case, when we have analysed and found them to be benign. I'm also not strongly set on this either, I'm just worried about safety. If everyone is clearly convinced this is safe in all cases, then I drop my argument. :) rengolin: My worry is about different support on the different units. Specifically, SVE supports…
				simollUnsubmitted Not Done Reply Inline Actions Siding with @rengolin here: let the backend crash if it does not support scalable types. Vectorizers should simply not generate scalable types if they are not supported by the target. Legalizing scalable to unscaled IR is hard to get right (the legality issues already came up). It's worse if you expect fast SIMD to come out of this.. basically you open up the cost modelling can all over, considering TTI/TLI.. just like in LV/VPlan. If an application for this legalization pops up down the road, this can be revisited (WebAssembly?). simoll: Siding with @rengolin here: let the backend crash if it does not support scalable types.
				rkruppeUnsubmitted Not Done Reply Inline Actions Nobody's expecting good code to come out of writing scalable vector code and compiling it for NEON or SSE or something, just as you can already generate atrocious code by e.g. generating 128 bit masked gathers & scatters while compiling for NEON (those intrinsics are expanded into a sequence of conditional scalar loads/stores if they're not legal). Still, in general it's nice to have all backends support most IR constructs functionally though inefficiently. Among other things, it helps ensure that the semantics are not tied to a specific target (so it can be implemented by other future backends) and allows running test programs to understand those semantics without needing a particular chip or emulator. This is aspirational rather than a hard rule, there is and will always be a large body of seemingly simple LLVM IR that asserts somewhere in CodeGen, but I see absolutely no reason to rule it out from the get-go in this case. If some specific intrinsic can't be legalized reasonably and correctly, then don't legalize it and ISel crash on that specific operation. But many things can be legalized correctly with little effort (and already are for fixed-width vectors). rkruppe: Nobody's expecting good code to come out of writing scalable vector code and compiling it for…
				rengolinUnsubmitted Not Done Reply Inline Actions I concede, given that this is just a proposal. It may well be the case that legalising scalable vectors to non-scalable ones will prove hard and we'll end up adding a number of `llvm_unreachable` cases to protect it. I don't foresee generating scalable vectors by default on any target anyway, so this will always be an edge case at best. rengolin: I concede, given that this is just a proposal. It may well be the case that legalising scalable…
				fixed size in the appropriate section of a resulting binary.

				Scalable vector types cannot be members of StructType or ArrayType aggregates
				because they are defined to be fixed size; while they could be extended as
				well, that isn't required to support scalable vectors and would need more
				code changes.

				Supporting scalable vectors in C structs (as in Arm's proposed ACLE for SVE)
				will still be possible by having clang lower the struct to a plain pointer
				with offsets scaled by the runtime multiple.

				Legalization
				------------

				To legalize a scalable vector IR type to SelectionDAG types, the same procedure
				is used as for fixed-length vectors, with one minor difference:

				- If the target does not support scalable vectors, the runtime multiple is
				assumed to be a constant '1' and the scalable flag is dropped. Legalization
				proceeds as normal after this.

				IR Textual Form
				---------------

				The textual form for a scalable vector is:

				``<scalable <n> x <type>>``

				where `type` is the scalar type of each element, `n` is the minimum number of
				elements, and the string literal `scalable` indicates that the total number of
				elements is an unknown multiple of `n`; `scalable` is just an arbitrary choice
				for indicating that the vector is scalable, and could be substituted by another.
				For fixed-length vectors, the `scalable` is omitted, so there is no change in
				the format for existing vectors.

				Scalable vectors with the same `Min` value have the same number of elements, and
				the same number of bytes if `Min * sizeof(type)` is the same (assuming they are
				used within the same function):

				``<scalable 4 x i32>`` and ``<scalable 4 x i8>`` have the same number of
				elements.

				``<scalable 4 x i32>`` and ``<scalable 8 x i16>`` have the same number of
				bytes.

				IR Bitcode Form
				---------------

				To serialize scalable vectors to bitcode, a new boolean field is added to the
				type record if the vector is scalable, and is omitted if fixed-length. This
				preserves backwards compatibility with existing bitcode.
				rkruppeUnsubmitted Done Reply Inline Actions Since aggregates containing scalable vectors aren't part of this proposal any more, is one of these two integers always zero? If so, how about instead returning one integer and a flag indicating if it needs to be scaled by the runtime multile (similar to VectorType::ElementCount)? rkruppe: Since aggregates containing scalable vectors aren't part of this proposal any more, is one of…
				huntergrAuthorUnsubmitted Not Done Reply Inline Actions Yes, you're right -- I was still thinking about cases where we could have mixed comparisons in future if we found a use case, but without aggregates we only need to distinguish between scalable/non-scalable quantities. This would also have the side effect of simplifying the comparison operations further, in that I'd only have to check that the flags were equal on both sides instead of checking for both sides being 0 for one term before returning the comparison of the other term (and then checking them again in the other order). Thanks. huntergr: Yes, you're right -- I was still thinking about cases where we could have mixed comparisons in…

				Size Queries
				------------
				rengolinUnsubmitted Not Done Reply Inline Actions Perfect. rengolin: Perfect.

				This is a proposal for how to deal with querying the size of scalable types for
				analysis of IR.

				For current IR types that have a known size, all query functions return a single
				integer constant. For scalable types a second integer is needed to indicate the
				number of bytes/bits which need to be scaled by the runtime multiple to obtain
				the actual length.

				For primitive types, `getPrimitiveSizeInBits()` will function as it does today,
				except that if the type is a VectorType marked as Scalable, it will assert.

				A new function `getScalableSizeInBits()` will be added, which returns a struct
				containing an integer to represent the minimum size and a boolean flag to
				indicate that the minimum size is scaled by the runtime multiple. For backends
				that do not need to deal with scalable types the existing methods will suffice,
				but an assert will be added to them to ensure they aren't used on scalable
				types. This will reduce the number of code changes required.

				Similar functionality will be added to DataLayout, and the struct definition
				will be shared between them.

				Comparisons between unscaled-only or scaled-only sizes will work as expected,
				and convenience operators will be provided. For now, if unscaled sizes are
				compared against scaled sizes, the comparison operator will assert. This
				restriction may be relaxed in future if a valid use case is found.

				Comparisons between scaled sizes with different runtime multiples is invalid.