This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/docs/
-
docs/
2/15
LangRef.rst

Differential D94964

[LangRef] Describe memory layout for vectors types
ClosedPublic

Authored by bjope on Jan 19 2021, 5:07 AM.

Download Raw Diff

Details

Reviewers

uweigand
efriedma
dmgreen
nemanjai
venkatra
atanasyan
markus
nlopes
aqjune

Commits

rG5737010a7948: [LangRef] Describe memory layout for vectors types

Summary

There are a couple of caveats when it comes to how vectors are
stored to memory, and thereby also how bitcast between vector
and integer types work, in LLVM IR. Specially in relation to
endianess. This patch is an attempt to document such things.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

markus created this revision.Jan 19 2021, 5:07 AM

Herald added a subscriber: jdoerfert. · View Herald TranscriptJan 19 2021, 5:07 AM

markus requested review of this revision.Jan 19 2021, 5:07 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 19 2021, 5:07 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B85704: Diff 317529.Jan 19 2021, 6:02 AM

bjope added inline comments.Jan 20 2021, 2:54 AM

llvm/docs/LangRef.rst
3210	Maybe it is just confusing to talk about sub-byte sized elements and C language here? I mean we need to define how it works for sizes larger than a byte as well. And the IR is source language agnostic (even if the motivation here might origin from C). We could perhaps just state that the layout is packed. And that the vector could be seen as one large iN scalar (N given by the type store size in bits of the vector), with element zero being in the most significant bits for a big-endian target and in the least significant bits for a little-endian target. I guess it isn't defined where padding goes if the type size is less than the type store size (e.g. <2 x i6> has a type size that is 12 bits, but the type store size is 16 bits).
10663–10668	I think if it might be good to add a caveat here about bitcasts involving vector types. For example that `bitcast <2 x i8> to i16` puts element zero of the vector in the least significant bits of the i16 for little-endian while element zero ends up in the most significant bits for big-endian.

That makes sense to me but before acting on it we should probably wait a while to see if the other reviewers have some feedback.

Hello. The general idea of documenting what llvm does sounds like a good idea. Alive agrees with this too, which is a good sign: https://alive2.llvm.org/ce/z/XbkTEz.

Do we know which backends support big endian? Arm and AArch64 do. Sparc, PPC, Mips, Lanai. It seems like quite a few do.

Right, perhaps we should add maintainers of those targets as reviewers since they may be more interested in documenting endianness differences than the little-endian crowd?

markus mentioned this in D94765: Expand masked mem intrinsics correctly wrt big-endian.Jan 29 2021, 2:47 AM

Added code owners for big-endian Sparc, PPC and MIPS as reviewers.

I am not sure if it is desired or even acceptable in the language reference, but my experience is that a diagram goes a long way towards explaining this. I've had to teach countless new developers here at IBM about the two vector layouts (since PPC supports both).
Something like this tends to resonate with developers:

Use a <4 x i32> vector as an example:
Memory:             Register(LE):          Register(BE):
 0x0 0x4 0x8 0xC     3  2  1  0             0  1  2  3
[A,  B,  C,  D]     [D, C, B, A]           [A, B, C, D]

As it shows both the relationship of the numbering of bytes in memory and the vector and the layout of the elements in the register.

llvm/docs/LangRef.rst
3210	I like the idea of comparing a vector to a scalar of the same width and stating where the elements are placed in terms of bit significance.
10663–10668	+1

dmgreen mentioned this in rG9498315c9ba3: Expand masked mem intrinsics correctly wrt big-endian.Feb 11 2021, 1:00 AM

I'll try to make some progress here.

Changed the wording quite a bit. Getting rid of references to C ABI etc.

bjope retitled this revision from Describe vector layout in LangRef to [LangRef] Describe memory layout for vectors types.Mar 16 2021, 7:37 AM

bjope edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B94051: Diff 330983.Mar 16 2021, 8:18 AM

dmgreen added inline comments.Mar 17 2021, 1:16 AM

llvm/docs/LangRef.rst
3214	together
3226	Does this apply to a v4i1? I thought that worked the same way as any other i1 type. The defined bits end up in the MSBs.

bjope added inline comments.Mar 17 2021, 2:19 AM

llvm/docs/LangRef.rst
3226	Does this apply to a v4i1? I thought that worked the same way as any other i1 type. The defined bits end up in the MSBs. I did not know about such rules for i1 (or other non-byte-sized first class types). Is that really specified somewhere? The description for `store`, https://llvm.org/docs/LangRef.html#store-instruction , says that "When writing a value of a type like i20 with a size that is not an integral number of bytes, it is unspecified what happens to the extra bits that do not belong to the type, but they will typically be overwritten.". That is not really saying anything about where the padding bits are placed either. I've assumed that the placement is unspecified as well (as I've never seen any definition).

dmgreen added reviewers: nlopes, aqjune.Mar 17 2021, 7:52 AM

dmgreen added inline comments.

llvm/docs/LangRef.rst
3226	I may be wrong about the MSB. It will already be used in certain parts of llvm though, if we have a <X x i1> masked load that is scalarized, it will bitcast the predicate to a iX. Alive defines it like this: https://alive2.llvm.org/ce/z/w5BhQa Which llc seems to agree with, from the mov r0, #4: https://godbolt.org/z/7Tx6aM But it was the masked load scalarization that was being fixed in D94765, so it may be worth pinning down the meaning.

bjope added inline comments.Mar 17 2021, 9:53 AM

llvm/docs/LangRef.rst
3226	I don't think the result from a backend (even alive in this case) really say if it is defined in the IR. I believe a target is likely to define where the padding goes if loading/storing non-byte-sized types, but LLVM does not know about it. Transformations on LLVM IR should therefore be extra careful when handling types with different "type size" and "type store size" (and several passes for example use `DataLayout::typeSizeEqualsStoreSize` to avoid certain transformations). Here is an example using "opt -O3" that show differences between little/big endian, and also that opt isn't able to simplify your example "src2" with `<4 x i1>`: https://godbolt.org/z/rvMxd1 Another way to see it is that you may bitcast <4 x i1> to i4 (from one first class non-agg type to another one with the same size), but you can't bitcast i4 to i8 (and bitcast is basically defined as a store (using the src type) followed by a load (using the dst type).

Overall LGTM. Thanks for documenting this! It was painful to reverse-engineer this when implementing it in Alive2..

My only concern is about leaving vector sizes not multiple of a byte as unspecified. Doing so is the same as saying those are illegal; can't be used in practice. Can we define those as being padded, where the padding is poison, like in structs? That allows using vectors where padding isn't easy, like <8 x i3>. How do you store that to memory if the direct store is left unspecified? And with people using weird integer sizes for ML & FPGAs, I think it's a good idea to define this case.
Alive2 is being conservative here and defining the padding as zero, but it should be poison IMHO.

I like the idea of ASCII-Art. My mind works better with examples and "pictures" then with long hard to read text.

Clang introduced _ExtInt exactly for those people:
http://blog.llvm.org/2020/04/the-new-clang-extint-feature-provides.html

What is a vector of _ExtInt(6)?

In D94964#2632569, @nlopes wrote:

Overall LGTM. Thanks for documenting this! It was painful to reverse-engineer this when implementing it in Alive2..

My only concern is about leaving vector sizes not multiple of a byte as unspecified. Doing so is the same as saying those are illegal; can't be used in practice. Can we define those as being padded, where the padding is poison, like in structs? That allows using vectors where padding isn't easy, like <8 x i3>. How do you store that to memory if the direct store is left unspecified? And with people using weird integer sizes for ML & FPGAs, I think it's a good idea to define this case.
Alive2 is being conservative here and defining the padding as zero, but it should be poison IMHO.

The idea I used for describing the layout of the vectors was to base it on the fact that bitcast is allowed between first class (non-agg) types of the same size. And bitcast is defined as doing store/load. IMO the reasoning for padding should be the same for vectors and scalars that has a store size that is larger than the type size (for example, <3 x i3> should be handled just like i9).

A type such as i9 has a type store size of 16 bits according to DataLayout. So when doing a store i9 it could be seen as if 16 bits are written. The content of the seven padding bits is not defined after the store, so saying that they contain "poison" is probably ok. As long as you load/store the same size things should be ok. If you for example do store i9, followed by load i9, then you should get the same i9 value back. Similarly if you store i9, load <3 x i3> and bitcast to i9, then you should also get the same result back. But I also believe that you could store i9, load i16, store i16, load i9 and get the same result back (so that would copy the padding). I figure the latter is similar to what happens when doing memcpy on a struct (if the padding in the struct is defined as poison).

However, what I referred to when writing "unspecified" was that the position of the padding bits isn't described (and defined) by the language reference. Afaict different targets are allowed to put the padding at different positions, not only depending on endianness, when the type size is smaller than the type store size. If the position of the padding needs to be defined, then it will be interesting to find out how different targets are doing it. There could even be some target out there that for example is padding differently for i4 and i12. Maybe it has to be defined by the DataLayout somehow, if it is needed.

Btw, the target I'm working with is one of those weird targets with support for vectors with none byte-sized elements. It had probably been a lot simpler for us if each vector element was padded to the size of a byte when storing the vector to memory (similar to how it works for arrays), but that is not how it works in LLVM IR (although the load/store in the DSP is padding between each element when dealing with such vectors).

Thanks for expanding on the padding. Sounds good to me.
LGTM.

llvm/docs/LangRef.rst
3207	isn't -> aren't

This revision is now accepted and ready to land.Mar 18 2021, 3:15 AM

Minor update (using a bit more graphical examples instead of only text, even though it isn't exactly ascii art as someone suggested).

Also made some clarification related to the unspecifiedness of non-byte-sized stores. Thought it might be nice to say something more about it, as it was asked about in the review.

This revision was landed with ongoing or failed builds.Mar 19 2021, 11:01 AM

Closed by commit rG5737010a7948: [LangRef] Describe memory layout for vectors types (authored by bjope). · Explain Why

This revision was automatically updated to reflect the committed changes.

bjope added a commit: rG5737010a7948: [LangRef] Describe memory layout for vectors types.

Harbormaster completed remote builds in B94744: Diff 331932.Mar 19 2021, 12:31 PM

mingmingl mentioned this in D133850: [AArch64] Improve codegen for "trunc <4 x i64> to <4 x i8>" for all cases.Sep 14 2022, 9:57 AM

danilaml added a subscriber: danilaml.Jul 26 2023, 7:26 AM

danilaml added inline comments.

llvm/docs/LangRef.rst
3253	Why is the least significant byte is placed at the largest memory address here, if it's little endian, just like for big endian? Shouldn't it be reversed?

Herald added a project: Restricted Project. · View Herald TranscriptJul 26 2023, 7:26 AM

bjope added inline comments.Jul 27 2023, 12:40 AM

llvm/docs/LangRef.rst
3253	@danilaml : Right, that indeed looks like a typo. store i16 0x5321, i16* %ptr would ofcourse put the 0x21 at the lower memory address. So this should say ; [%ptr + 0]: 00100001 (0x21) ; [%ptr + 1]: 01010011 (0x53)

danilaml added inline comments.Jul 27 2023, 12:06 PM

llvm/docs/LangRef.rst
3253	@bjope should I submit the fix as NFC or will you do it?

bjope mentioned this in rG9a53fe50f435: [LangRef] Fix typo in example describing memory layout of a vector. NFC.Jul 27 2023, 4:01 PM

bjope added inline comments.Jul 27 2023, 4:04 PM

llvm/docs/LangRef.rst
3253	I've pushed a fixup here: https://reviews.llvm.org/rG9a53fe50f4355e6dfcd6af534cb394a62128963b

danilaml added inline comments.Jul 27 2023, 5:44 PM

llvm/docs/LangRef.rst
3253	Great, thanks!

Revision Contents

Path

Size

llvm/

docs/

LangRef.rst

68 lines

Diff 331940

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 3,194 Lines • ▼ Show 20 Lines
	A vector type is a simple derived type that represents a vector of			A vector type is a simple derived type that represents a vector of
	elements. Vector types are used when multiple primitive data are			elements. Vector types are used when multiple primitive data are
	operated in parallel using a single instruction (SIMD). A vector type			operated in parallel using a single instruction (SIMD). A vector type
	requires a size (number of elements), an underlying primitive data type,			requires a size (number of elements), an underlying primitive data type,
	and a scalable property to represent vectors where the exact hardware			and a scalable property to represent vectors where the exact hardware
	vector length is unknown at compile time. Vector types are considered			vector length is unknown at compile time. Vector types are considered
	:ref:`first class <t_firstclass>`.			:ref:`first class <t_firstclass>`.

				:Memory Layout:

				In general vector elements are laid out in memory in the same way as
				:ref:`array types <t_array>`. Such an anology works fine as long as the vector
				elements are byte sized. However, when the elements of the vector aren't byte
				nlopesUnsubmitted Not Done Reply Inline Actions isn't -> aren't nlopes: isn't -> aren't
				sized it gets a bit more complicated. One way to describe the layout is by
				describing what happens when a vector such as <N x iM> is bitcasted to an
				integer type with N*M bits, and then following the rules for storing such an
				bjopeAuthorUnsubmitted Not Done Reply Inline Actions Maybe it is just confusing to talk about sub-byte sized elements and C language here? I mean we need to define how it works for sizes larger than a byte as well. And the IR is source language agnostic (even if the motivation here might origin from C). We could perhaps just state that the layout is packed. And that the vector could be seen as one large iN scalar (N given by the type store size in bits of the vector), with element zero being in the most significant bits for a big-endian target and in the least significant bits for a little-endian target. I guess it isn't defined where padding goes if the type size is less than the type store size (e.g. <2 x i6> has a type size that is 12 bits, but the type store size is 16 bits). bjope: Maybe it is just confusing to talk about sub-byte sized elements and C language here? I mean we…
				nemanjaiUnsubmitted Not Done Reply Inline Actions I like the idea of comparing a vector to a scalar of the same width and stating where the elements are placed in terms of bit significance. nemanjai: I like the idea of comparing a vector to a scalar of the same width and stating where the…
				integer to memory.

				A bitcast from a vector type to a scalar integer type will see the elements
				being packed together (without padding). The order in which elements are
				dmgreenUnsubmitted Not Done Reply Inline Actions together dmgreen: together
				inserted in the integer depends on endianess. For little endian element zero
				is put in the least significant bits of the integer, and for big endian
				element zero is put in the most significant bits.

				Using a vector such as ``<i4 1, i4 2, i4 3, i4 5>`` as an example, together
				with the analogy that we can replace a vector store by a bitcast followed by
				an integer store, we ge this for big endian:

				.. code-block:: llvm

				%val = bitcast <4 x i4> <i4 1, i4 2, i4 3, i4 5> to i16

				dmgreenUnsubmitted Not Done Reply Inline Actions Does this apply to a v4i1? I thought that worked the same way as any other i1 type. The defined bits end up in the MSBs. dmgreen: Does this apply to a v4i1? I thought that worked the same way as any other i1 type. The defined…
				bjopeAuthorUnsubmitted Not Done Reply Inline Actions Does this apply to a v4i1? I thought that worked the same way as any other i1 type. The defined bits end up in the MSBs. I did not know about such rules for i1 (or other non-byte-sized first class types). Is that really specified somewhere? The description for `store`, https://llvm.org/docs/LangRef.html#store-instruction , says that "When writing a value of a type like i20 with a size that is not an integral number of bytes, it is unspecified what happens to the extra bits that do not belong to the type, but they will typically be overwritten.". That is not really saying anything about where the padding bits are placed either. I've assumed that the placement is unspecified as well (as I've never seen any definition). bjope: > Does this apply to a v4i1? I thought that worked the same way as any other i1 type. The…
				dmgreenUnsubmitted Not Done Reply Inline Actions I may be wrong about the MSB. It will already be used in certain parts of llvm though, if we have a <X x i1> masked load that is scalarized, it will bitcast the predicate to a iX. Alive defines it like this: https://alive2.llvm.org/ce/z/w5BhQa Which llc seems to agree with, from the mov r0, #4: https://godbolt.org/z/7Tx6aM But it was the masked load scalarization that was being fixed in D94765, so it may be worth pinning down the meaning. dmgreen: I may be wrong about the MSB. It will already be used in certain parts of llvm though, if we…
				bjopeAuthorUnsubmitted Not Done Reply Inline Actions I don't think the result from a backend (even alive in this case) really say if it is defined in the IR. I believe a target is likely to define where the padding goes if loading/storing non-byte-sized types, but LLVM does not know about it. Transformations on LLVM IR should therefore be extra careful when handling types with different "type size" and "type store size" (and several passes for example use `DataLayout::typeSizeEqualsStoreSize` to avoid certain transformations). Here is an example using "opt -O3" that show differences between little/big endian, and also that opt isn't able to simplify your example "src2" with `<4 x i1>`: https://godbolt.org/z/rvMxd1 Another way to see it is that you may bitcast <4 x i1> to i4 (from one first class non-agg type to another one with the same size), but you can't bitcast i4 to i8 (and bitcast is basically defined as a store (using the src type) followed by a load (using the dst type). bjope: I don't think the result from a backend (even alive in this case) really say if it is defined…
				; Bitcasting from a vector to an integral type can be seen as
				; concatenating the values:
				; %val now has the hexadecimal value 0x1235.

				store i16 %val, i16* %ptr

				; In memory the content will be (8-bit addressing):
				;
				; [%ptr + 0]: 00010010 (0x12)
				; [%ptr + 1]: 00110101 (0x35)

				The same example for little endian:

				.. code-block:: llvm

				%val = bitcast <4 x i4> <i4 1, i4 2, i4 3, i4 5> to i16

				; Bitcasting from a vector to an integral type can be seen as
				; concatenating the values:
				; %val now has the hexadecimal value 0x5321.

				store i16 %val, i16* %ptr

				; In memory the content will be (8-bit addressing):
				;
				; [%ptr + 0]: 01010011 (0x53)
				; [%ptr + 1]: 00100001 (0x21)
				danilamlUnsubmitted Not Done Reply Inline Actions Why is the least significant byte is placed at the largest memory address here, if it's little endian, just like for big endian? Shouldn't it be reversed? danilaml: Why is the least significant byte is placed at the largest memory address here, if it's little…
				bjopeAuthorUnsubmitted Not Done Reply Inline Actions @danilaml : Right, that indeed looks like a typo. store i16 0x5321, i16* %ptr would ofcourse put the 0x21 at the lower memory address. So this should say ; [%ptr + 0]: 00100001 (0x21) ; [%ptr + 1]: 01010011 (0x53) bjope: @danilaml : Right, that indeed looks like a typo. ``` store i16 0x5321, i16* %ptr ``` would…
				danilamlUnsubmitted Not Done Reply Inline Actions @bjope should I submit the fix as NFC or will you do it? danilaml: @bjope should I submit the fix as NFC or will you do it?
				bjopeAuthorUnsubmitted Done Reply Inline Actions I've pushed a fixup here: https://reviews.llvm.org/rG9a53fe50f4355e6dfcd6af534cb394a62128963b bjope: I've pushed a fixup here: https://reviews.llvm.org/rG9a53fe50f4355e6dfcd6af534cb394a62128963b
				danilamlUnsubmitted Done Reply Inline Actions Great, thanks! danilaml: Great, thanks!

				When ``<N*M>`` isn't evenly divisible by the byte size the exact memory layout
				is unspecified (just like it is for an integral type of the same size). This
				is because different targets could put the padding at different positions when
				the type size is smaller than the types store size.

	:Syntax:			:Syntax:

	::			::

	< <# elements> x <elementtype> > ; Fixed-length vector			< <# elements> x <elementtype> > ; Fixed-length vector
	< vscale x <# elements> x <elementtype> > ; Scalable vector			< vscale x <# elements> x <elementtype> > ; Scalable vector

	The number of elements is a constant integer value larger than 0;			The number of elements is a constant integer value larger than 0;
	▲ Show 20 Lines • Show All 7,387 Lines • ▼ Show 20 Lines
	The '``bitcast``' instruction converts ``value`` to type ``ty2``. It			The '``bitcast``' instruction converts ``value`` to type ``ty2``. It
	is always a no-op cast because no bits change with this			is always a no-op cast because no bits change with this
	conversion. The conversion is done as if the ``value`` had been stored			conversion. The conversion is done as if the ``value`` had been stored
	to memory and read back as type ``ty2``. Pointer (or vector of			to memory and read back as type ``ty2``. Pointer (or vector of
	pointers) types may only be converted to other pointer (or vector of			pointers) types may only be converted to other pointer (or vector of
	pointers) types with the same address space through this instruction.			pointers) types with the same address space through this instruction.
	To convert pointers to other types, use the :ref:`inttoptr <i_inttoptr>`			To convert pointers to other types, use the :ref:`inttoptr <i_inttoptr>`
	or :ref:`ptrtoint <i_ptrtoint>` instructions first.			or :ref:`ptrtoint <i_ptrtoint>` instructions first.

				There is a caveat for bitcasts involving vector types in relation to
				endianess. For example ``bitcast <2 x i8> <value> to i16`` puts element zero
				of the vector in the least significant bits of the i16 for little-endian while
				element zero ends up in the most significant bits for big-endian.

				bjopeAuthorUnsubmitted Not Done Reply Inline Actions I think if it might be good to add a caveat here about bitcasts involving vector types. For example that `bitcast <2 x i8> to i16` puts element zero of the vector in the least significant bits of the i16 for little-endian while element zero ends up in the most significant bits for big-endian. bjope: I think if it might be good to add a caveat here about bitcasts involving vector types. For…
				nemanjaiUnsubmitted Not Done Reply Inline Actions +1 nemanjai: +1
	Example:			Example:
	""""""""			""""""""

	.. code-block:: text			.. code-block:: text

	%X = bitcast i8 255 to i8 ; yields i8 :-1			%X = bitcast i8 255 to i8 ; yields i8 :-1
	%Y = bitcast i32* %x to sint* ; yields sint*:%x			%Y = bitcast i32* %x to sint* ; yields sint*:%x
	%Z = bitcast <2 x int> %V to i64; ; yields i64: %V			%Z = bitcast <2 x int> %V to i64; ; yields i64: %V (depends on endianess)
	%Z = bitcast <2 x i32> %V to <2 x i64> ; yields <2 x i64*>			%Z = bitcast <2 x i32> %V to <2 x i64> ; yields <2 x i64*>

	.. _i_addrspacecast:			.. _i_addrspacecast:

	'``addrspacecast .. to``' Instruction			'``addrspacecast .. to``' Instruction
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	Syntax:			Syntax:
	▲ Show 20 Lines • Show All 11,180 Lines • Show Last 20 Lines