This is an archive of the discontinued LLVM Phabricator instance.

[LangRef] Describe memory layout for vectors types
ClosedPublic

Authored by bjope on Jan 19 2021, 5:07 AM.

Details

Summary

There are a couple of caveats when it comes to how vectors are
stored to memory, and thereby also how bitcast between vector
and integer types work, in LLVM IR. Specially in relation to
endianess. This patch is an attempt to document such things.

Diff Detail

Event Timeline

markus created this revision.Jan 19 2021, 5:07 AM
markus requested review of this revision.Jan 19 2021, 5:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 19 2021, 5:07 AM
bjope added inline comments.Jan 20 2021, 2:54 AM
llvm/docs/LangRef.rst
3161

Maybe it is just confusing to talk about sub-byte sized elements and C language here? I mean we need to define how it works for sizes larger than a byte as well. And the IR is source language agnostic (even if the motivation here might origin from C).

We could perhaps just state that the layout is packed. And that the vector could be seen as one large iN scalar (N given by the type store size in bits of the vector), with element zero being in the most significant bits for a big-endian target and in the least significant bits for a little-endian target.

I guess it isn't defined where padding goes if the type size is less than the type store size (e.g. <2 x i6> has a type size that is 12 bits, but the type store size is 16 bits).

10547

I think if it might be good to add a caveat here about bitcasts involving vector types. For example that bitcast <2 x i8> to i16 puts element zero of the vector in the least significant bits of the i16 for little-endian while element zero ends up in the most significant bits for big-endian.

That makes sense to me but before acting on it we should probably wait a while to see if the other reviewers have some feedback.

Hello. The general idea of documenting what llvm does sounds like a good idea. Alive agrees with this too, which is a good sign: https://alive2.llvm.org/ce/z/XbkTEz.

Do we know which backends support big endian? Arm and AArch64 do. Sparc, PPC, Mips, Lanai. It seems like quite a few do.

Right, perhaps we should add maintainers of those targets as reviewers since they may be more interested in documenting endianness differences than the little-endian crowd?

Added code owners for big-endian Sparc, PPC and MIPS as reviewers.

I am not sure if it is desired or even acceptable in the language reference, but my experience is that a diagram goes a long way towards explaining this. I've had to teach countless new developers here at IBM about the two vector layouts (since PPC supports both).
Something like this tends to resonate with developers:

Use a <4 x i32> vector as an example:
Memory:             Register(LE):          Register(BE):
 0x0 0x4 0x8 0xC     3  2  1  0             0  1  2  3
[A,  B,  C,  D]     [D, C, B, A]           [A, B, C, D]

As it shows both the relationship of the numbering of bytes in memory and the vector and the layout of the elements in the register.

llvm/docs/LangRef.rst
3161

I like the idea of comparing a vector to a scalar of the same width and stating where the elements are placed in terms of bit significance.

10547

+1

bjope commandeered this revision.Mar 16 2021, 7:33 AM
bjope edited reviewers, added: markus; removed: bjope.

I'll try to make some progress here.

bjope updated this revision to Diff 330983.Mar 16 2021, 7:36 AM

Changed the wording quite a bit. Getting rid of references to C ABI etc.

bjope retitled this revision from Describe vector layout in LangRef to [LangRef] Describe memory layout for vectors types.Mar 16 2021, 7:37 AM
bjope edited the summary of this revision. (Show Details)
dmgreen added inline comments.Mar 17 2021, 1:16 AM
llvm/docs/LangRef.rst
3165

together

3177

Does this apply to a v4i1? I thought that worked the same way as any other i1 type. The defined bits end up in the MSBs.

bjope added inline comments.Mar 17 2021, 2:19 AM
llvm/docs/LangRef.rst
3177

Does this apply to a v4i1? I thought that worked the same way as any other i1 type. The defined bits end up in the MSBs.

I did not know about such rules for i1 (or other non-byte-sized first class types). Is that really specified somewhere?

The description for store, https://llvm.org/docs/LangRef.html#store-instruction , says that "When writing a value of a type like i20 with a size that is not an integral number of bytes, it is unspecified what happens to the extra bits that do not belong to the type, but they will typically be overwritten.". That is not really saying anything about where the padding bits are placed either. I've assumed that the placement is unspecified as well (as I've never seen any definition).

dmgreen added inline comments.
llvm/docs/LangRef.rst
3177

I may be wrong about the MSB. It will already be used in certain parts of llvm though, if we have a <X x i1> masked load that is scalarized, it will bitcast the predicate to a iX.

Alive defines it like this:
https://alive2.llvm.org/ce/z/w5BhQa

Which llc seems to agree with, from the mov r0, #4:
https://godbolt.org/z/7Tx6aM

But it was the masked load scalarization that was being fixed in D94765, so it may be worth pinning down the meaning.

bjope added inline comments.Mar 17 2021, 9:53 AM
llvm/docs/LangRef.rst
3177

I don't think the result from a backend (even alive in this case) really say if it is defined in the IR. I believe a target is likely to define where the padding goes if loading/storing non-byte-sized types, but LLVM does not know about it. Transformations on LLVM IR should therefore be extra careful when handling types with different "type size" and "type store size" (and several passes for example use DataLayout::typeSizeEqualsStoreSize to avoid certain transformations).

Here is an example using "opt -O3" that show differences between little/big endian, and also that opt isn't able to simplify your example "src2" with <4 x i1>:
https://godbolt.org/z/rvMxd1

Another way to see it is that you may bitcast <4 x i1> to i4 (from one first class non-agg type to another one with the same size), but you can't bitcast i4 to i8 (and bitcast is basically defined as a store (using the src type) followed by a load (using the dst type).

Overall LGTM. Thanks for documenting this! It was painful to reverse-engineer this when implementing it in Alive2..

My only concern is about leaving vector sizes not multiple of a byte as unspecified. Doing so is the same as saying those are illegal; can't be used in practice. Can we define those as being padded, where the padding is poison, like in structs? That allows using vectors where padding isn't easy, like <8 x i3>. How do you store that to memory if the direct store is left unspecified? And with people using weird integer sizes for ML & FPGAs, I think it's a good idea to define this case.
Alive2 is being conservative here and defining the padding as zero, but it should be poison IMHO.

I like the idea of ASCII-Art. My mind works better with examples and "pictures" then with long hard to read text.

Clang introduced _ExtInt exactly for those people:
http://blog.llvm.org/2020/04/the-new-clang-extint-feature-provides.html

What is a vector of _ExtInt(6)?

bjope added a comment.Mar 17 2021, 1:53 PM

Overall LGTM. Thanks for documenting this! It was painful to reverse-engineer this when implementing it in Alive2..

My only concern is about leaving vector sizes not multiple of a byte as unspecified. Doing so is the same as saying those are illegal; can't be used in practice. Can we define those as being padded, where the padding is poison, like in structs? That allows using vectors where padding isn't easy, like <8 x i3>. How do you store that to memory if the direct store is left unspecified? And with people using weird integer sizes for ML & FPGAs, I think it's a good idea to define this case.
Alive2 is being conservative here and defining the padding as zero, but it should be poison IMHO.

The idea I used for describing the layout of the vectors was to base it on the fact that bitcast is allowed between first class (non-agg) types of the same size. And bitcast is defined as doing store/load. IMO the reasoning for padding should be the same for vectors and scalars that has a store size that is larger than the type size (for example, <3 x i3> should be handled just like i9).

A type such as i9 has a type store size of 16 bits according to DataLayout. So when doing a store i9 it could be seen as if 16 bits are written. The content of the seven padding bits is not defined after the store, so saying that they contain "poison" is probably ok. As long as you load/store the same size things should be ok. If you for example do store i9, followed by load i9, then you should get the same i9 value back. Similarly if you store i9, load <3 x i3> and bitcast to i9, then you should also get the same result back. But I also believe that you could store i9, load i16, store i16, load i9 and get the same result back (so that would copy the padding). I figure the latter is similar to what happens when doing memcpy on a struct (if the padding in the struct is defined as poison).

However, what I referred to when writing "unspecified" was that the position of the padding bits isn't described (and defined) by the language reference. Afaict different targets are allowed to put the padding at different positions, not only depending on endianness, when the type size is smaller than the type store size. If the position of the padding needs to be defined, then it will be interesting to find out how different targets are doing it. There could even be some target out there that for example is padding differently for i4 and i12. Maybe it has to be defined by the DataLayout somehow, if it is needed.

Btw, the target I'm working with is one of those weird targets with support for vectors with none byte-sized elements. It had probably been a lot simpler for us if each vector element was padded to the size of a byte when storing the vector to memory (similar to how it works for arrays), but that is not how it works in LLVM IR (although the load/store in the DSP is padding between each element when dealing with such vectors).

nlopes accepted this revision.Mar 18 2021, 3:15 AM

Thanks for expanding on the padding. Sounds good to me.
LGTM.

llvm/docs/LangRef.rst
3158

isn't -> aren't

This revision is now accepted and ready to land.Mar 18 2021, 3:15 AM
bjope updated this revision to Diff 331932.Mar 19 2021, 10:54 AM
bjope edited the summary of this revision. (Show Details)

Minor update (using a bit more graphical examples instead of only text, even though it isn't exactly ascii art as someone suggested).

Also made some clarification related to the unspecifiedness of non-byte-sized stores. Thought it might be nice to say something more about it, as it was asked about in the review.

This revision was landed with ongoing or failed builds.Mar 19 2021, 11:01 AM
This revision was automatically updated to reflect the committed changes.
danilaml added inline comments.
llvm/docs/LangRef.rst
3204

Why is the least significant byte is placed at the largest memory address here, if it's little endian, just like for big endian?
Shouldn't it be reversed?

Herald added a project: Restricted Project. · View Herald TranscriptJul 26 2023, 7:26 AM
bjope added inline comments.Jul 27 2023, 12:40 AM
llvm/docs/LangRef.rst
3204

@danilaml : Right, that indeed looks like a typo.

store i16 0x5321, i16* %ptr

would ofcourse put the 0x21 at the lower memory address. So this should say

;    [%ptr + 0]: 00100001  (0x21)
;    [%ptr + 1]: 01010011  (0x53)
danilaml added inline comments.Jul 27 2023, 12:06 PM
llvm/docs/LangRef.rst
3204

@bjope should I submit the fix as NFC or will you do it?

bjope added inline comments.Jul 27 2023, 4:04 PM
llvm/docs/LangRef.rst
3204
danilaml added inline comments.Jul 27 2023, 5:44 PM
llvm/docs/LangRef.rst
3204

Great, thanks!