This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
docs/
2/13
ScalableVectorType.rst

Differential D53695

Scalable VectorType RFC
AbandonedPublic

Authored by huntergr on Oct 25 2018, 5:56 AM.

Download Raw Diff

Details

Reviewers

lattner
hfinkel
rkruppe
simoll
rengolin
chandlerc
echristo

Summary

This contains the RFC for the scalable vector type alone, without the intrinsics.

This document will not be committed to the codebase, this is only for review.

I will post additions/changes to the RFC here.

Diff Detail

Event Timeline

Updated based on discussions at the 2018 devmeeting

rogfer01 added a subscriber: rogfer01.Oct 25 2018, 8:35 AM

This is looking good to me, thanks!

A few comments inline.

Also, can you move this to docs/Proposals, together with the other RFCs, please?

docs/ScalableVectorType.rst
41	Precisely. This is an optimisation problem, not an IR representation one.
46	It'd be good to have a small paragraph on why not for both restrictions.
56	It may not be that simple. If the middle-end has assumed scalable vectors exist, then it has also added wrapping code for the predication, runtime check of induction evolution, and potentially have used intrinsics to represent concepts that cannot be done in plain IR. If that's the case, then dropping the scalable would make a lot of other things to fail, potentially not in the legalisation phase, but further down the line, which will be much harder to debug. I think a safer assumption is: if the target doesn't support scalable vectors, then scalable vectors are not legal and need to fail.
110	Perfect.

This seems like yet another step in the right direction. Of course I may be biased as I've already been happy with previous iterations.

docs/ScalableVectorType.rst
48	Handling those in the frontend instead sounds like a good tradeoff for how much this simplifies the IR and the size queries in particular, kudos!
56	I don't think having a primitive IR type be not functional at all on the majority of targets will be acceptable. I also don't understand what problems you are seeing: any scalable vector code the middle end generates has to be valid if the processor implementation the code runs on happens to use a multiple (vscale) of 1, right? Dropping `scalable` in legalization just means specializing for this case (vscale=1) at compile time. If target-specific intrinsics are used that can't sensibly be implemented on another target (e.g., SVE first-fault loads), then of course instruction selection should fail. Intrinsics like `vscale()` or `stepvector()` are trivial to implement on any fixed-width SIMD architecture though.
107	Since aggregates containing scalable vectors aren't part of this proposal any more, is one of these two integers always zero? If so, how about instead returning one integer and a flag indicating if it needs to be scaled by the runtime multile (similar to VectorType::ElementCount)?

In D53695#1277187, @rengolin wrote:

Also, can you move this to docs/Proposals, together with the other RFCs, please?

Ah, I didn't spot that. Will do. Does phabricator automatically copy across the inline comments?

docs/ScalableVectorType.rst
56	Our original approach assumed that scalable vectors being illegal on other targets would be sufficient, but the feedback I got at the dev meeting indicated otherwise -- hence this change. As Robin says, the IR must be valid for vscale being 1 (as well as any other value), and the proposed intrinsics can easily be lowered to something appropriate for fixed-length vectors.
107	Yes, you're right -- I was still thinking about cases where we could have mixed comparisons in future if we found a use case, but without aggregates we only need to distinguish between scalable/non-scalable quantities. This would also have the side effect of simplifying the comparison operations further, in that I'd only have to check that the flags were equal on both sides instead of checking for both sides being 0 for one term before returning the comparison of the other term (and then checking them again in the other order). Thanks.

rengolin added inline comments.Oct 31 2018, 6:48 AM

docs/ScalableVectorType.rst
56	My worry is about different support on the different units. Specifically, SVE supports scatter/gather, predication, etc. which NEON doesn't. So, for Arm, generating scalable code and lowering to NEON may not only be terrible for performance, but could end up exposing a host of illegal lowering scenarios. I'm not saying we should forbid them by definition, but we could make them hard fail now and add them, case by case, when we have analysed and found them to be benign. I'm also not strongly set on this either, I'm just worried about safety. If everyone is clearly convinced this is safe in all cases, then I drop my argument. :)

Throwing in my 2 cents on the legalization issue. Apart from that, LGTM.

simoll added inline comments.Oct 31 2018, 7:11 AM

docs/ScalableVectorType.rst
56	Siding with @rengolin here: let the backend crash if it does not support scalable types. Vectorizers should simply not generate scalable types if they are not supported by the target. Legalizing scalable to unscaled IR is hard to get right (the legality issues already came up). It's worse if you expect fast SIMD to come out of this.. basically you open up the cost modelling can all over, considering TTI/TLI.. just like in LV/VPlan. If an application for this legalization pops up down the road, this can be revisited (WebAssembly?).

rkruppe added inline comments.Oct 31 2018, 7:51 AM

docs/ScalableVectorType.rst
56	Nobody's expecting good code to come out of writing scalable vector code and compiling it for NEON or SSE or something, just as you can already generate atrocious code by e.g. generating 128 bit masked gathers & scatters while compiling for NEON (those intrinsics are expanded into a sequence of conditional scalar loads/stores if they're not legal). Still, in general it's nice to have all backends support most IR constructs functionally though inefficiently. Among other things, it helps ensure that the semantics are not tied to a specific target (so it can be implemented by other future backends) and allows running test programs to understand those semantics without needing a particular chip or emulator. This is aspirational rather than a hard rule, there is and will always be a large body of seemingly simple LLVM IR that asserts somewhere in CodeGen, but I see absolutely no reason to rule it out from the get-go in this case. If some specific intrinsic can't be legalized reasonably and correctly, then don't legalize it and ISel crash on that specific operation. But many things can be legalized correctly with little effort (and already are for fixed-width vectors).

Updated based on feedback:

Add reasons for restrictions on global/aggregates
Change size query struct to an integer + a boolean instead of two integers.

I didn't move it to the proposals directory, since I'm not sure if we would lose the inline comments that way.

huntergr marked 2 inline comments as done.Nov 2 2018, 4:36 PM

Update based on off-list feedback:

Added a section on allowed operations to clarify this is a first class type.
Remove initial proposal on the flag for not inheriting vlen; there's more pushback against allowing runtime multiple changes. I think this design could still be extended to support that in the future, but I'm afraid I'll have to leave that battle to the RVV team.
Minor wording changes.

In D53695#1286251, @huntergr wrote:

I didn't move it to the proposals directory, since I'm not sure if we would lose the inline comments that way.

That's ok. You can move after approval and before commit.

In D53695#1376902, @huntergr wrote:

Added a section on allowed operations to clarify this is a first class type.

Remove initial proposal on the flag for not inheriting vlen; there's more pushback against allowing runtime multiple changes. I think this design could still be extended to support that in the future, but I'm afraid I'll have to leave that battle to the RVV team.

Minor wording changes.

Those changes look good to me. I think we converged into a good solution.

I'll approve now, but please wait a day or two for the last comments form the other reviewers.

Thanks!

docs/ScalableVectorType.rst
56	I concede, given that this is just a proposal. It may well be the case that legalising scalable vectors to non-scalable ones will prove hard and we'll end up adding a number of `llvm_unreachable` cases to protect it. I don't foresee generating scalable vectors by default on any target anyway, so this will always be an edge case at best.

This revision is now accepted and ready to land.Jan 30 2019, 8:23 AM

simoll accepted this revision.Jan 31 2019, 6:49 AM

Very outdated.

Herald added a project: Restricted Project. · View Herald TranscriptJul 18 2023, 6:39 AM

Herald added subscribers: wangpc, mgabka, alextsao1999 and 2 others. · View Herald Transcript

Revision Contents

Path

Size

docs/

ScalableVectorType.rst

132 lines

Diff 171073

docs/ScalableVectorType.rst

This file was added.

				=============================================================
				Extending VectorType to support scalable vector architectures
				=============================================================

				To represent a vector of unknown length a boolean `Scalable` property has been
				added to the `VectorType` class, which indicates that the number of elements in
				the vector is a runtime-determined integer multiple of the `NumElements` field.
				Most code that deals with vectors doesn't need to know the exact length, but
				does need to know relative lengths -- e.g. get a vector with the same number of
				elements but a different element type, or with half or double the number of
				elements.

				In order to allow code to transparently support scalable vectors, we introduce
				an `ElementCount` class with two members:

				- `unsigned Min`: the minimum number of elements.
				- `bool Scalable`: is the element count an unknown multiple of `Min`?

				For non-scalable vectors (``Scalable=false``) the scale is considered to be
				equal to one and thus `Min` represents the exact number of elements in the
				vector.

				The intent for code working with vectors is to use convenience methods and avoid
				directly dealing with the number of elements. If needed, calling
				`getElementCount` on a vector type instead of `getVectorNumElements` can be used
				to obtain the (potentially scalable) number of elements. Overloaded division and
				multiplication operators allow an ElementCount instance to be used in much the
				same manner as an integer for most cases.

				This mixture of compile-time and runtime quantities allow us to reason about the
				relationship between different scalable vector types without knowing their
				exact length.

				The runtime multiple is not expected to change during program execution for SVE,
				but it is possible. The model of scalable vectors presented in this RFC assumes
				that the multiple will be constant within a function but not necessarily across
				functions. As suggested in the recent RISC-V rfc, a new function attribute to
				inherit the multiple across function calls will allow for function calls with
				vector arguments/return values and inlining/outlining optimizations.

				IR Textual Form
				rengolinUnsubmitted Not Done Reply Inline Actions Precisely. This is an optimisation problem, not an IR representation one. rengolin: Precisely. This is an optimisation problem, not an IR representation one.
				---------------

				The textual form for a scalable vector is:

				``<scalable <n> x <type>>``
				rengolinUnsubmitted Done Reply Inline Actions It'd be good to have a small paragraph on why not for both restrictions. rengolin: It'd be good to have a small paragraph on why not for both restrictions.

				where `type` is the scalar type of each element, `n` is the minimum number of
				rkruppeUnsubmitted Not Done Reply Inline Actions Handling those in the frontend instead sounds like a good tradeoff for how much this simplifies the IR and the size queries in particular, kudos! rkruppe: Handling those in the frontend instead sounds like a good tradeoff for how much this simplifies…
				elements, and the string literal `scalable` indicates that the total number of
				elements is an unknown multiple of `n`; `scalable` is just an arbitrary choice
				for indicating that the vector is scalable, and could be substituted by another.
				For fixed-length vectors, the `scalable` is omitted, so there is no change in
				the format for existing vectors.

				Scalable vectors with the same `Min` value have the same number of elements, and
				the same number of bytes if `Min * sizeof(type)` is the same (assuming they are
				rengolinUnsubmitted Not Done Reply Inline Actions It may not be that simple. If the middle-end has assumed scalable vectors exist, then it has also added wrapping code for the predication, runtime check of induction evolution, and potentially have used intrinsics to represent concepts that cannot be done in plain IR. If that's the case, then dropping the scalable would make a lot of other things to fail, potentially not in the legalisation phase, but further down the line, which will be much harder to debug. I think a safer assumption is: if the target doesn't support scalable vectors, then scalable vectors are not legal and need to fail. rengolin: It may not be that simple. If the middle-end has assumed scalable vectors exist, then it has…
				rkruppeUnsubmitted Not Done Reply Inline Actions I don't think having a primitive IR type be not functional at all on the majority of targets will be acceptable. I also don't understand what problems you are seeing: any scalable vector code the middle end generates has to be valid if the processor implementation the code runs on happens to use a multiple (vscale) of 1, right? Dropping `scalable` in legalization just means specializing for this case (vscale=1) at compile time. If target-specific intrinsics are used that can't sensibly be implemented on another target (e.g., SVE first-fault loads), then of course instruction selection should fail. Intrinsics like `vscale()` or `stepvector()` are trivial to implement on any fixed-width SIMD architecture though. rkruppe: I don't think having a primitive IR type be not functional at all on the majority of targets…
				huntergrAuthorUnsubmitted Not Done Reply Inline Actions Our original approach assumed that scalable vectors being illegal on other targets would be sufficient, but the feedback I got at the dev meeting indicated otherwise -- hence this change. As Robin says, the IR must be valid for vscale being 1 (as well as any other value), and the proposed intrinsics can easily be lowered to something appropriate for fixed-length vectors. huntergr: Our original approach assumed that scalable vectors being illegal on other targets would be…
				rengolinUnsubmitted Not Done Reply Inline Actions My worry is about different support on the different units. Specifically, SVE supports scatter/gather, predication, etc. which NEON doesn't. So, for Arm, generating scalable code and lowering to NEON may not only be terrible for performance, but could end up exposing a host of illegal lowering scenarios. I'm not saying we should forbid them by definition, but we could make them hard fail now and add them, case by case, when we have analysed and found them to be benign. I'm also not strongly set on this either, I'm just worried about safety. If everyone is clearly convinced this is safe in all cases, then I drop my argument. :) rengolin: My worry is about different support on the different units. Specifically, SVE supports…
				simollUnsubmitted Not Done Reply Inline Actions Siding with @rengolin here: let the backend crash if it does not support scalable types. Vectorizers should simply not generate scalable types if they are not supported by the target. Legalizing scalable to unscaled IR is hard to get right (the legality issues already came up). It's worse if you expect fast SIMD to come out of this.. basically you open up the cost modelling can all over, considering TTI/TLI.. just like in LV/VPlan. If an application for this legalization pops up down the road, this can be revisited (WebAssembly?). simoll: Siding with @rengolin here: let the backend crash if it does not support scalable types.
				rkruppeUnsubmitted Not Done Reply Inline Actions Nobody's expecting good code to come out of writing scalable vector code and compiling it for NEON or SSE or something, just as you can already generate atrocious code by e.g. generating 128 bit masked gathers & scatters while compiling for NEON (those intrinsics are expanded into a sequence of conditional scalar loads/stores if they're not legal). Still, in general it's nice to have all backends support most IR constructs functionally though inefficiently. Among other things, it helps ensure that the semantics are not tied to a specific target (so it can be implemented by other future backends) and allows running test programs to understand those semantics without needing a particular chip or emulator. This is aspirational rather than a hard rule, there is and will always be a large body of seemingly simple LLVM IR that asserts somewhere in CodeGen, but I see absolutely no reason to rule it out from the get-go in this case. If some specific intrinsic can't be legalized reasonably and correctly, then don't legalize it and ISel crash on that specific operation. But many things can be legalized correctly with little effort (and already are for fixed-width vectors). rkruppe: Nobody's expecting good code to come out of writing scalable vector code and compiling it for…
				rengolinUnsubmitted Not Done Reply Inline Actions I concede, given that this is just a proposal. It may well be the case that legalising scalable vectors to non-scalable ones will prove hard and we'll end up adding a number of `llvm_unreachable` cases to protect it. I don't foresee generating scalable vectors by default on any target anyway, so this will always be an edge case at best. rengolin: I concede, given that this is just a proposal. It may well be the case that legalising scalable…
				used within the same function):

				``<scalable 4 x i32>`` and ``<scalable 4 x i8>`` have the same number of
				elements.

				``<scalable 4 x i32>`` and ``<scalable 8 x i16>`` have the same number of
				bytes.

				IR Bitcode Form
				---------------

				To serialize scalable vectors to bitcode, a new boolean field is added to the
				type record. If the field is not present the type will default to a fixed-length
				vector type, preserving backwards compatibility.

				Size Queries
				------------

				This is a proposal for how to deal with querying the size of scalable types for
				analysis of IR. While it has not been implemented in full, the general approach
				works well for calculating offsets into structures with scalable types in a
				modified version of ComputeValueVTs in our downstream compiler.

				For current IR types that have a known size, all query functions return a single
				integer constant. For scalable types a second integer is needed to indicate the
				number of bytes/bits which need to be scaled by the runtime multiple to obtain
				the actual length.

				For primitive types, `getPrimitiveSizeInBits()` will function as it does today,
				except that it will no longer return a size for vector types (it will return 0,
				as it does for other derived types). The majority of calls to this function are
				already for scalar rather than vector types.

				For derived types, a function `getScalableSizePairInBits()` will be added, which
				returns a pair of integers (one to indicate unscaled bits, the other for bits
				that need to be scaled by the runtime multiple). For backends that do not need
				to deal with scalable types the existing methods will suffice, but a debug-only
				assert will be added to them to ensure they aren't used on scalable types.

				Similar functionality will be added to DataLayout.

				Comparisons between sizes will use the following methods, assuming that X and
				Y are non-zero integers and the form is of { unscaled, scaled }.

				{ X, 0 } <cmp> { Y, 0 }: Normal unscaled comparison.

				{ 0, X } <cmp> { 0, Y }: Normal comparison within a function, or across
				functions that inherit vector length. Cannot be
				compared across non-inheriting functions.

				{ X, 0 } > { 0, Y }: Cannot return true.
				rkruppeUnsubmitted Done Reply Inline Actions Since aggregates containing scalable vectors aren't part of this proposal any more, is one of these two integers always zero? If so, how about instead returning one integer and a flag indicating if it needs to be scaled by the runtime multile (similar to VectorType::ElementCount)? rkruppe: Since aggregates containing scalable vectors aren't part of this proposal any more, is one of…
				huntergrAuthorUnsubmitted Not Done Reply Inline Actions Yes, you're right -- I was still thinking about cases where we could have mixed comparisons in future if we found a use case, but without aggregates we only need to distinguish between scalable/non-scalable quantities. This would also have the side effect of simplifying the comparison operations further, in that I'd only have to check that the flags were equal on both sides instead of checking for both sides being 0 for one term before returning the comparison of the other term (and then checking them again in the other order). Thanks. huntergr: Yes, you're right -- I was still thinking about cases where we could have mixed comparisons in…

				{ X, 0 } = { 0, Y }: Cannot return true.

				rengolinUnsubmitted Not Done Reply Inline Actions Perfect. rengolin: Perfect.
				{ X, 0 } < { 0, Y }: Can return true.

				{ Xu, Xs } <cmp> { Yu, Ys }: Gets complicated, need to subtract common
				terms and try the above comparisons; it
				may not be possible to get a good answer.

				It's worth noting that we don't expect the last case (mixed scaled and
				unscaled sizes) to occur. Richard Sandiford's proposed C extensions
				(http://lists.llvm.org/pipermail/cfe-dev/2018-May/057830.html) explicitly
				prohibits mixing fixed-size types into sizeless struct.

				I don't know if we need a 'maybe' or 'unknown' result for cases comparing scaled
				vs. unscaled; I believe the gcc implementation of SVE allows for such
				results, but that supports a generic polynomial length representation.

				My current intention is to rely on functions that clone or copy values to
				check whether they are being used to copy scalable vectors across function
				boundaries without the inherit vlen attribute and raise an error there instead
				of requiring passing the Function a type size is from for each comparison. If
				there's a strong preference for moving the check to the size comparison function
				let me know; I will be starting work on patches for this later in the year if
				there's no major problems with the idea.