This is an archive of the discontinued LLVM Phabricator instance.

ARMEB: Vector extend operations
ClosedPublic

Authored by cpirker on Jun 6 2014, 6:44 AM.

Details

Reviewers
cpirker
Summary

Hi all,

This patch implements correct vector extensions from <2 x 8>, <2 x 16>, and <2 x 32> to <2 x 64> and <4 x 8>, <4 x 16> to <4 x 32> vector type for big endian.
It applies to both integer and float type vectors. The CLANG vectorizer is likely to generate such vector extensions for array type casts.

Background: LLVM is generating a scalar load for under-sized vectors and simply use this loaded value as vector data.
In particular, LLVM employes the VLD1_LN instruction to load one lane of vector.
This works well in little endian. However in big endian mode, scalar and vector data representation differ.
Please refer to http://llvm.org/docs/BigEndianNEON.html for more information on vectors in big endian.

Fix: Insert corresponding VREV instruction to convert scalar-loaded data into vector representation in big endian mode.

Please review.

Thanks,
Christian

Diff Detail

Event Timeline

cpirker updated this revision to Diff 10179.Jun 6 2014, 6:44 AM
cpirker retitled this revision from to ARMEB: Vector extend operations.
cpirker updated this object.
cpirker edited the test plan for this revision. (Show Details)
cpirker added subscribers: Unknown Object (MLST), Konrad.

Hi Christian,

Could you please explain why this is necessary? I'm somewhat confused.

We have canonicalised on using LD1 for vector loads, and our register content is in the form "as if" loaded by an LD1. I therefore do not understand why you need a VREV32 after the LD1. The lane order should be correct, and all you need to do is lengthen.

In fact, we chose LD1 as our form precisely because we didn't want to change the vectorizer!

Cheers,

James

Hi James,

In order to load a 2x8 vector, LLVM generates a ld1.16, to load a 2x16
vector a ld1.32 load is utilized. Therefore VREV instructions are needed
for a correct vector representation in registers.

Cheers,
Conny

Am 2014-06-12 12:01, schrieb James Molloy:

Hi Christian,

Could you please explain why this is necessary? I'm somewhat confused.

We have canonicalised on using LD1 for vector loads, and our register
content is in the form "as if" loaded by an LD1. I therefore do not
understand why you need a VREV32 after the LD1. The lane order should
be correct, and all you need to do is lengthen.

In fact, we chose LD1 as our form precisely because we didn't want
to change the vectorizer!

Cheers,

James

http://reviews.llvm.org/D4043

Hi Konrad,

Thanks for the explanation. So as I understand the problem, LLVM is generating a scalar load for under-sized vectors.

I'd like to see a fuller explanation of what is going on and why in the source code and commit message. The important bits being:

  • We need to load an under-sized vector.
  • To do this we need to use a VLD1_LN to load one lane of a vector.
  • So we need to pretend that we're loading a larger vector element size than we are.
  • This means we load "as-if" VLDR, and need to perform a REV to get us back right again.

Also, the following testcase doesn't generate the best code sequence:

; CHECK-LABEL: vector_ext_2i32_to_2i64:
; CHECK: vldr [[REG:d[0-9]+]]
; CHECK: vrev64.32 [[REG]], [[REG]]
; CHECK: vmovl.u32 {{q[0-9]+}}, [[REG]]

That VLDR can be a VLD1.32 dX, which means we don't need a REV. Can you please change this? I suspect this only affects extloads from 64-bit types to 128-bit types.

Cheers,

James

Hi James,

Would you please accept http://reviews.llvm.org/D4043 as functional OK,
and would consider any VREV, VLD optimization as separate issue?

Cheers,
Conny

Hi Konrad,

No, I don't think so. As I mentioned in my previous review comment, I would like to see more explanations in the code and commit message before I'm happy.

Also, I believe you're actually editing the code that generates this VLDR/VREV pair now. So I think that for 64-bit to 128-bit vector extloads, you can just use the little-endian version, not predicate it, and make little-endian generate an LD1 instead of a LDR.

Also, I take it this affects AArch64 too?

Cheers,

James

cpirker updated this revision to Diff 10442.Jun 16 2014, 4:21 AM
cpirker updated this object.

Updated summary and added some source comments.

cpirker updated this revision to Diff 10481.Jun 17 2014, 2:54 AM

Reverted the big endian version of the "Lengthen_Single" pattern.
Current version is fine for both little and big endian modes.

Thanks,
Christian

Hi James,

can you please review my third revision.

Thanks,
Christian

cpirker accepted this revision.Jun 23 2014, 11:52 AM
cpirker added a reviewer: cpirker.

I committed this patch as rL211520.

This revision is now accepted and ready to land.Jun 23 2014, 11:52 AM
cpirker closed this revision.Jun 23 2014, 11:52 AM