This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Generate dot for v16i8 sum reduction to i32
ClosedPublic

Authored by mivnay on Sep 30 2020, 8:24 AM.

Details

Summary

Convert VECREDUCE_ADD( EXTEND(v16i8_type) ) to VECREDUCE_ADD( DOTv16i8(v16i8_type) ) whenever the result type is i32. This gains in one of the SPECCPU 2017 benchmark.

This partially solves the bug: https://bugs.llvm.org/show_bug.cgi?id=46888
Meta ticket: https://bugs.llvm.org/show_bug.cgi?id=46929

Diff Detail

Event Timeline

mivnay created this revision.Sep 30 2020, 8:24 AM
mivnay requested review of this revision.Sep 30 2020, 8:24 AM
mivnay updated this revision to Diff 295303.Sep 30 2020, 8:42 AM

I haven't looked in much detail at this patch, but this looks like some straightforward lowering of llvm.experimental.vector.reduce.add. Absolutely nothing wrong with that, but I am curious who's going to produce this intrinsic? The vectoriser, the matrix pass? In other words, any ideas on the bigger picture?

mivnay added a comment.EditedOct 1 2020, 2:51 AM

I haven't looked in much detail at this patch, but this looks like some straightforward lowering of llvm.experimental.vector.reduce.add. Absolutely nothing wrong with that, but I am curious who's going to produce this intrinsic? The vectoriser, the matrix pass? In other words, any ideas on the bigger picture?

Thanks for looking into the patch. The pattern added in lit test gets generated from the below C code:

#include <stdint.h>
#include <stdlib.h>

int func(uint8_t *a) {
  int sum = 0;
  for (int i = 0; i < 16; i++) {
    sum += a[i];
  }

  return sum;
}

EDIT: SLP vectorizer generates the pattern in this case.

Hello

Can you update with full context? -U999999. It makes phabriactor reviews easier to follow.

I had thought about this somewhat in reference to inloop reductions. I had presumed that it would need some form of partial reduction though, as you would want part of the reduction would then happen outside the loop (I think)

Improving codegen on it's own is good, but I'm interested in seeing how this fits with the other patches.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10970

We ideally shouldn't be just producing a machine node here. Can you add a AArch64ISD::UDOT node?

We should be doing the same for SDOT too.

mivnay edited the summary of this revision. (Show Details)Oct 1 2020, 3:18 AM
mivnay added a comment.Oct 1 2020, 3:29 AM

Hello

Can you update with full context? -U999999. It makes phabriactor reviews easier to follow.

I had thought about this somewhat in reference to inloop reductions. I had presumed that it would need some form of partial reduction though, as you would want part of the reduction would then happen outside the loop (I think)

Improving codegen on it's own is good, but I'm interested in seeing how this fits with the other patches.

Hi,

I am working on the performance related issues mentioned in the bug and meta ticket. I have three unrelated patches (i.e., patterns) for codegen improvements. This is the first one.

mivnay updated this revision to Diff 295529.Oct 1 2020, 4:36 AM

Added support for SDOT

mivnay added inline comments.Oct 1 2020, 4:41 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10970

I have added the support for SDOT.

Can you add a AArch64ISD::UDOT node?

Can this be done in a later patch?

mivnay retitled this revision from [AArch64] Generate udot for v16i8 sum reduction to i32 to [AArch64] Generate dot for v16i8 sum reduction to i32.Oct 1 2020, 4:42 AM
mivnay edited the summary of this revision. (Show Details)

EDIT: SLP vectorizer generates the pattern in this case.

Ah I see.

https://llvm.org/docs/Phabricator.html#phabricator-request-review-web has some details on creating diffs with extra context.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10949–10951

Any single statement blocks do not need { } brackets.

10964–10965

This second isScalable check isn't needed, and maybe move the num elements == 16 check up to the VT check, by checking for v16i32 directly?

10970

It's best not create machine nodes directly in the DAG...

Maybe create a int_aarch64_neon_sdot intrinsic node instead? That will avoid the need for a new node, and go through the existing tablegen patterns.

It would be nice to be able to fold add(x, UDOT(0, y)) -> UDOT(x, y), which is where having a node will really become useful. That needn't be done here though.

10972

EVT::getVectorVT(*DAG.getContext(), MVT::i32, 4) -> MVT::v4i32

I wasn't aware that you could use getConstant directly on vector types like this. Neat.

10981

Why is this is calling ReplaceAllUsesOfValueWith? Is returning the value not enough?

llvm/test/CodeGen/AArch64/neon-dot-product.ll
261

Can you update the tests to show all the instructions? The 1's and 0's and addv's are important here too.

I would just use the update_llc_test_checks script.

mivnay updated this revision to Diff 295737.Oct 1 2020, 11:13 PM

Fixed context and review comments

mivnay marked 7 inline comments as done.Oct 1 2020, 11:14 PM
dmgreen accepted this revision.Oct 2 2020, 12:29 AM

Thanks. LGTM with one minor modification.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11015

EVT::getVectorVT(*DAG.getContext(), MVT::i32, 4) -> MVT::v4i32 :)

This revision is now accepted and ready to land.Oct 2 2020, 12:29 AM
mivnay updated this revision to Diff 295784.Oct 2 2020, 3:47 AM
mivnay marked an inline comment as done.
mivnay added a comment.Oct 2 2020, 5:38 AM

The build failure is unrelated to this patch and happening in other patches as well (Example: https://reviews.llvm.org/D88471). What should I do? Also, I do not have commit access. Can you please help in committing this patch?

The build failure is unrelated to this patch and happening in other patches as well (Example: https://reviews.llvm.org/D88471). What should I do? Also, I do not have commit access. Can you please help in committing this patch?

Yeah I tend to ignore those failures. I can check it though.

I can certainly commit this. I just need an author string to attribute it correctly. Is "Vinay Madhusudan <vinay@compilertree.com>" OK for that?

mivnay added a comment.Oct 2 2020, 8:27 AM

I can certainly commit this. I just need an author string to attribute it correctly. Is "Vinay Madhusudan <vinay@compilertree.com>" OK for that?

Yes, thanks a lot.

This revision was landed with ongoing or failed builds.Oct 2 2020, 9:17 AM
This revision was automatically updated to reflect the committed changes.