Page MenuHomePhabricator

Describe vector layout in LangRef
Needs ReviewPublic

Authored by markus on Jan 19 2021, 5:07 AM.



As far as I can tell layout of IR vectors (especially those with sub-byte sized elements) have not been described in the documentation. This patch tries to address that using material from the comments in where the decision appears to have been taken.

Diff Detail

Event Timeline

markus created this revision.Jan 19 2021, 5:07 AM
markus requested review of this revision.Jan 19 2021, 5:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 19 2021, 5:07 AM
bjope added inline comments.Jan 20 2021, 2:54 AM

Maybe it is just confusing to talk about sub-byte sized elements and C language here? I mean we need to define how it works for sizes larger than a byte as well. And the IR is source language agnostic (even if the motivation here might origin from C).

We could perhaps just state that the layout is packed. And that the vector could be seen as one large iN scalar (N given by the type store size in bits of the vector), with element zero being in the most significant bits for a big-endian target and in the least significant bits for a little-endian target.

I guess it isn't defined where padding goes if the type size is less than the type store size (e.g. <2 x i6> has a type size that is 12 bits, but the type store size is 16 bits).


I think if it might be good to add a caveat here about bitcasts involving vector types. For example that bitcast <2 x i8> to i16 puts element zero of the vector in the least significant bits of the i16 for little-endian while element zero ends up in the most significant bits for big-endian.

That makes sense to me but before acting on it we should probably wait a while to see if the other reviewers have some feedback.

Hello. The general idea of documenting what llvm does sounds like a good idea. Alive agrees with this too, which is a good sign:

Do we know which backends support big endian? Arm and AArch64 do. Sparc, PPC, Mips, Lanai. It seems like quite a few do.

Right, perhaps we should add maintainers of those targets as reviewers since they may be more interested in documenting endianness differences than the little-endian crowd?

Added code owners for big-endian Sparc, PPC and MIPS as reviewers.

I am not sure if it is desired or even acceptable in the language reference, but my experience is that a diagram goes a long way towards explaining this. I've had to teach countless new developers here at IBM about the two vector layouts (since PPC supports both).
Something like this tends to resonate with developers:

Use a <4 x i32> vector as an example:
Memory:             Register(LE):          Register(BE):
 0x0 0x4 0x8 0xC     3  2  1  0             0  1  2  3
[A,  B,  C,  D]     [D, C, B, A]           [A, B, C, D]

As it shows both the relationship of the numbering of bytes in memory and the vector and the layout of the elements in the register.


I like the idea of comparing a vector to a scalar of the same width and stating where the elements are placed in terms of bit significance.