This is an archive of the discontinued LLVM Phabricator instance.

Implement aarch64 neon instruction class SIMD copy - LLVM
ClosedPublic

Authored by kevin.qin on Oct 7 2013, 10:32 PM.

Details

Reviewers
jmolloy
Summary

This patch implemented aarch64 neon SIMD copy instruction class.
I rebased this patch on latest truck.

This patch contains:

  1. Implemented SIMD dup instruction.
  2. Remove DupImm, use VDUP instead.

Known problem:

  1. In AAPCS64, half float can serve as setters and getters for function, so some ACLE intrinsic would receive or provide a variable with half float. But this feature isn't supported by Clang at moment. Even when I hack diagnose detector to avoid throwing error message, all half float will be converted to i16 in IR.
  2. There is no v4f16 in IR, which is used by some ACLE intrinsic.

Considering above problem, all ACLE intrinsic with half float isn't implemented in these patches. We will try to figure out how to implement this feature.
Please review, thanks.

Diff Detail

Event Timeline

Hi Kevin,

Overall looks good, but I have quite a few comments on individual areas.

Cheers,

James

lib/Target/AArch64/AArch64ISelLowering.cpp
75

This patch doesn't implement half-float support, so this line seems out of place.

309

Again, as the patch avoids full half-float support, this line seems out of place.

322

Half-float, out of place.

3505

I believe this code is copied from the ARM (32-bit) backend. It's quite substantial, and it would be nice if it could be factored away somewhere in a helper function and reused by both backends.

3640

This comment doesn't help understand the code below. The code actually checks if this is a BUILD_VECTOR with a non-constant operand in index 0, and the splat lane is 0.

3642

This condition seems wrong to me. The code *looks* like it produces a VDUP of a constant to all lanes (as opposed to the VDUPLANE below).

Why does this only work when the operand is *not* a constant? I would have thought that the condition should be negated - && isa<ConstantSDNode>(V1.getOperand(0));

Also, why does it only work when the splat lane is 0? can it not be: isa<ConstantSDNode>(V1.getOperand(Lane))?

3664

Pedantic: Space between // and C

3686

This is a weird-looking condition - the canonical condition would be "NV1Elt.size() == Length", would it not?

3691

Pedantic: // space D.

3695

As these variables are only used once, I suggest removing them and just using .size() in the if condition:

if (NV1Elt.size() != Length || NV2Elt.size() != Length)

3698

Would it be better to call this "InsMasks" or something instead of "InsArray"? That way we know that one array contains the masks, the other contains the indices.

3702

Coding style: if you're using braces around the if(), then you should use braces around the else().

3707

Several changes here:

  • I'd use "I" instead of "InsertNum" (nitpick, feel free to ignore)
  • I'd use "E" instead of "Index" (or at least something that is less generic than Index - Index doesn't feel like the name for an "end" variable)
  • Index = InsArray.size()
3713

Instead of mutating InsArray in-place, I'd create a variable and change that. Something like:

int Mask = InsArray[InsertNum];
if (Mask > V1EltNum) {

ExtV = V2;
Mask -= V1EltNum);

}

3715

Please add a comment as to why this is necessary! Won't this get legalized anyway?

lib/Target/AArch64/AArch64InstrNEON.td
2488

Can you say why these (i32 X) are needed?

3350

I thought this patch wasn't meant to add half-float support?

4512

These pattern match SMOV nodes - would it not be better to pattern match (sext (COPY ) ) instead? That way the copy coalescer could be more effective. Same for UMOV nodes (zext (COPY ) )

Hi Kevin,

Mostly I agree with James's comments. Just one extra and one justification.

lib/Target/AArch64/AArch64ISelLowering.cpp
869–870

You should add code to print out a NEON_VDUP node

lib/Target/AArch64/AArch64InstrNEON.td
4512

They appear to be copies from lanes, and I don't think you can write a COPY node that does that.

(Well, unless we added sub8_0, ..., sub8_15 indices, and even if you could make that work I'd worry LLVM would decide to start tracking v0.16b[4] separately from v0.16b[5] causing nasty partial register dependencies)

silviu.baranga added inline comments.Oct 8 2013, 6:57 AM
lib/Target/AArch64/AArch64InstrNEON.td
4512

Also the zext node can't be in the pattern output and I can't see any good way this can be expressed in terms of copies. I think the current implementation is the best solution for now.

kevin.qin updated this revision to Unknown Object (????).Oct 9 2013, 2:46 AM

Implemented most of the suggestions.

Comment at: lib/Target/AArch64/AArch64InstrNEON.td:2488
@@ -2488,3 +2487,3 @@

(v8i8 (trunc (v8i16 (srl (v8i16 node:$Rn),
  • (v8i16 (Neon_dupImm 8))))))>;

+ (v8i16 (Neon_vdup (i32 8)))))))>;

def _4s : PatFrag<(ops node:$Rn),

Can you say why these (i32 X) are needed?

If I remove value type here, tablegen will fail with an error:
Could not infer all types in pattern result!

So I keep these value types here.

jmolloy accepted this revision.Oct 9 2013, 3:39 AM

Hi,

This now LGTM.

Thanks!

James

Hi,

Thanks James. If there is no more comments today, I will commit it tomorrow.

2013/10/9 James Molloy <james.molloy@arm.com>

Hi,

This now LGTM.

Thanks!

James

http://llvm-reviews.chandlerc.com/D1854

Eugene.Zelenko closed this revision.Oct 4 2016, 6:22 PM
Eugene.Zelenko added a subscriber: Eugene.Zelenko.

Committed in rL192410.