This is an archive of the discontinued LLVM Phabricator instance.

ABISysV_arm64: compute return value for large vectors correctly
ClosedPublic

Authored by labath on May 3 2017, 10:13 AM.

Details

Summary

Arm64 Procedure Call Standard specifies than only vectors up to 16 bytes
are stored in v0 (which makes sense, as that's the size of the
register). 32-byte vector types are passed as regular structs via x8
pointer. Treat them as such.

This fixes TestReturnValue for arm64-clang. I also split the test case
into two so I can avoid the if(gcc) line, and annotate each test
instead. (It seems the vector type tests fail with gcc only when
targetting x86 arches).

Diff Detail

Repository
rL LLVM

Event Timeline

labath created this revision.May 3 2017, 10:13 AM
tberghammer edited edge metadata.May 4 2017, 6:48 AM

I am a bit confused by the correlation between your change and commit message. In the commit message you say that 32 byte structs are passed as x8 pointers but the implementation of LoadValueFromConsecutiveGPRRegisters seems to read it out from the v0-v8 registers for vectors of up to 8 elements independently of there size. Also based on that code I have the suspicion that the first branch (where byte_size <= 16) is not actually used or necessary and also I don't see anything in the ABI documentation indicating otherwise (it would be a pretty crazy ABI if they say that if you have 4 double then passed in a single 32 byte register while if you have 8 double then passed in 8 different 32 byte registers). Can you make sure that branch is necessary (e.g. removing it breaks at least 1 test)?

packages/Python/lldbsuite/test/functionalities/return-value/TestReturnValue.py
194 ↗(On Diff #97676)

(nit): Not needed

labath added a comment.May 4 2017, 7:16 AM

I am a bit confused by the correlation between your change and commit message. In the commit message you say that 32 byte structs

I mean 32-byte vectors. I.e. variables declared as float foo __attribute__((__vector_size__(32)));

are passed as x8 pointers but the implementation of LoadValueFromConsecutiveGPRRegisters seems to read it out from the v0-v8 registers for vectors of up to 8 elements independently of there size.

LoadValueFromConsecutiveGPRRegisters does this for "homogeneous structs", which is a different concept than vector: """Note that for short-vector types the fundamental types are 64-bit vector and 128-bit vector; the type of the elements in the short vector does not form part of the test for homogeneity. """

So an 8-byte and 16-byte vector (and probably structures containing them) are passed in v0..v7 registers. However, a 32-byte vector is not a short-vector type, nor a homogeneous aggregate, so it is passed as a generic struct, via the v8 pointer.

Also based on that code I have the suspicion that the first branch (where byte_size <= 16) is not actually used or necessary and also I don't see anything in the ABI documentation indicating otherwise (it would be a pretty crazy ABI if they say that if you have 4 double then passed in a single 32 byte register while if you have 8 double then passed in 8 different 32 byte registers). Can you make sure that branch is necessary (e.g. removing it breaks at least 1 test)?

Removing the branch makes the test for 8 and 16-byte vectors fail.

tberghammer accepted this revision.May 5 2017, 3:39 AM

Makes sense. Thank you for the explanation (I assumed homogeneous aggregate and vector are the same).

This revision is now accepted and ready to land.May 5 2017, 3:39 AM
This revision was automatically updated to reflect the committed changes.