This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Use NEON's tbl1 for 16xi8 and 8xi8 build vector with mask.
ClosedPublic

Authored by lawben on Mar 16 2023, 3:31 AM.

Details

Summary

When using Clang's __builtin_shufflevector with a 16xi8 or 8xi8 source and runtime mask on an AArch64 target, LLVM currently generates 16 or 8 extract+and+insert operations. This patch replaces these inserts with (a vector AND +) NEON's tbl1 intruction.

Issue: https://github.com/llvm/llvm-project/issues/60515

Diff Detail

Event Timeline

lawben created this revision.Mar 16 2023, 3:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2023, 3:31 AM
lawben requested review of this revision.Mar 16 2023, 3:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2023, 3:31 AM
lawben retitled this revision from Use NEON' tbl1 for 16xi8 build vector with mask. to Use NEON's tbl1 for 16xi8 build vector with mask..Mar 16 2023, 3:35 AM
lawben retitled this revision from Use NEON's tbl1 for 16xi8 build vector with mask. to [AArch64] Use NEON's tbl1 for 16xi8 build vector with mask..Mar 16 2023, 3:45 AM
fhahn added a subscriber: fhahn.Mar 16 2023, 10:22 AM
fhahn added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10214

Why limit this it v16i8?

10215

It looks like we are missing test coverage for most (all?) exits. It would be great if you could extend the test coverage.

lawben updated this revision to Diff 506065.Mar 17 2023, 6:30 AM

Add tests for exit cases and extend to v8i8.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10214

Good point. I've added v8i8 too. I only had the v16i8 use case in mind when writing this. This also works for other vector sizes, but that requires some additional logic to multiply and and byte offsets, as TBL only supports v16i8 and v8i8 directly. I've seen some of that around the code and I think x86 instruction selection also does this. If you think this is valuable and the right place to do it, I'd be happy to implement that in a follow-up patch.

10215

I've added a few tests to cover the exits. Please let me know if you think something is missing. I also noticed that some of my assumptions were too pessimistic in the process, so I removed them.

lawben retitled this revision from [AArch64] Use NEON's tbl1 for 16xi8 build vector with mask. to [AArch64] Use NEON's tbl1 for 16xi8 and 8xi8 build vector with mask..Mar 17 2023, 8:08 AM
lawben edited the summary of this revision. (Show Details)

Hi - I like the idea of generating tbl instructions from non-constant masks. Does the patch need an update to be based against main?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10201–10294

Should something check that the inner extract element (MaskSource) has an index equal to i?

10201–10294

This doesn't need the cast any more.

10203

This can test against the EVT directly: VT != MVT::v16i8 && .. .
Other types could be handled too, by breaking the larger type up into i8 lanes. i.e a i32 would add [idx*4,idx*4+1,idx*4+2,idx*4+3] for example, if you get my meaning.
I'm not sure if that is worth adding to this patch, or handling in another but it would be good to keep in mind.

10209

You can concat with undef if all the needed elements are in the bottom half.

llvm/test/CodeGen/AArch64/neon-shuffle-vector-tbl.ll
12

It would be good to have a test case without this And mask, just using %mask directly.

lawben updated this revision to Diff 506937.Mar 21 2023, 6:08 AM
lawben marked 3 inline comments as done.

Add a path without AND mask.

We now have a path that does not need an AND on the mask source. We now either expect all elements to have an AND or none.
Also addresses some review comments.

@dmgreen Thanks for the feedback. I've addressed the comments you made. I've renamed the method, as it does not require an AND anymore.

re: update to base against main. I tried to merge the current main branch into another patch (using git) and completely broke my setup, i.e., arc is doing really weird things, trying to lint hundreds of files, etc. So I'm not sure what the correct way is to do this. Do you maybe have a pointer for me? This is my first patch, so I'm new to this setup.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10201–10294

Yes, good catch. I've added a check and a test.

10203

I also had this in mind but was thinking about putting it in another patch. v16i8 and v8i8 directly map, so they are easier to generate. Other sizes have some more overhead, so it makes sense to split them up. I've seen the logic for this in X86 instruction selection, so I guess that could just be reused. I'll do that in a follow-up, once this is merged.

llvm/test/CodeGen/AArch64/neon-shuffle-vector-tbl.ll
12

Makes sense. There is actually no reason to limit this to and-masked masks. So I added a path that either expects all elements to have an AND mask or none. If there are no ANDs, we can omit it and directly emit a tbl. I was looking at this from Clang's __builtin_shufflevector() (as this is my use) and that always adds an AND.

lawben edited the summary of this revision. (Show Details)Mar 21 2023, 6:31 AM

@dmgreen Thanks for the feedback. I've addressed the comments you made. I've renamed the method, as it does not require an AND anymore.

re: update to base against main. I tried to merge the current main branch into another patch (using git) and completely broke my setup, i.e., arc is doing really weird things, trying to lint hundreds of files, etc. So I'm not sure what the correct way is to do this. Do you maybe have a pointer for me? This is my first patch, so I'm new to this setup.

I'm afraid I'm a bit of a luddite who just uploads patches manually into phabricator and doesn't use arc. I usually use git rebase -i to squash patches into a single commit, and upload them from there. Maybe the same thing will help with arc?

lawben updated this revision to Diff 507322.Mar 22 2023, 5:21 AM

Rebase main

@dmgreen Okay, I rebased onto main and submited via arc. I hope this did the right thing, but it doesn't look broken. I guess my mistake in the other patch was to create a merge commit instead of rebasing. For future reference, I'll just always squash into in commit.

@dmgreen Okay, I rebased onto main and submited via arc. I hope this did the right thing, but it doesn't look broken. I guess my mistake in the other patch was to create a merge commit instead of rebasing. For future reference, I'll just always squash into in commit.

I use arc a lot and rebase things frequently. What you did may not even be incorrect, but arc has a habit of doing crazy things for small mistakes. It's usually better to find a safe method rather than trying to understand why it did what it did.

I usually have the patches that are in review in a branch. So if I want to rebase them onto main, I update main, checkout the branch. Then git rebase main to update them all. Then to update each review I interactive rebase back to each patch and run arc diff from there.

You can apply the same thing if you have a series of say 5 patches, and someone asks you to update patch 1 which means you have to change all the rest. Rebase back to patch 1, marking all the commits to edit. Do your changes, arc diff, then continue the rebase, fixing up each commit as you go and updating the reviews.

lawben updated this revision to Diff 508030.Mar 24 2023, 3:45 AM

An honest attempt to do a rebase with arc :)

dmgreen accepted this revision.Mar 27 2023, 7:40 AM

Thanks - that does look better now. As far as I can see this LGTM. Thanks for the patch.

This revision is now accepted and ready to land.Mar 27 2023, 7:40 AM

@fhahn is there anything missing from your side? If not, could one of you or @dmgreen merge this? As this is my first patch, I obviously do not have push access :)

Hello. I can merge, do you have a "name <email@domain.com>" I can use for git attribution?

Thanks. Could you please use "Lawrence Benson <github@lawben.com>".

fhahn accepted this revision.Mar 29 2023, 5:34 AM

Thanks, looks good from me side as well.

This revision was landed with ongoing or failed builds.Mar 29 2023, 7:26 AM
This revision was automatically updated to reflect the committed changes.