This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Combine UADDVs to generate vector add
ClosedPublic

Authored by mivnay on Oct 2 2020, 5:31 AM.

Details

Diff Detail

Event Timeline

mivnay created this revision.Oct 2 2020, 5:31 AM
mivnay requested review of this revision.Oct 2 2020, 5:31 AM
mivnay updated this revision to Diff 295832.Oct 2 2020, 8:19 AM
mivnay updated this revision to Diff 297403.Oct 10 2020, 3:37 AM

Add support for all integer types.

The pre-commit check is not being helpful here... :-/

Seems like a useful optimisation to me though.

Can you add test for i16 and i8. As far as I understand they will not fold because we will have legalized types and the return type will not match the vector element type? It's still worth having the tests. We could think of doing this target independent instead. Folding add(vecreduce(x), vecreduce(y)) -> vecreduce(add(x, y)). It sounds generally useful to me. World that work in your case, or would it specifically need to work on UADDV?

Can you also run the update_llc_test_checks.py script on the file and pre-commit the tests, just showing the changes here.

lib/Target/AArch64/AArch64ISelLowering.cpp
12279 ↗(On Diff #297403)

--> UADDV(ADD a, b)

12281 ↗(On Diff #297403)

You can get DAG from DCI.DAG (and I don't think think method uses DCI, which might simplify things).

12376 ↗(On Diff #297403)

I think this can be:

if (SDValue Val = performUADDVCombine(N, DCI, DAG))
  return Val;

Can you also run the update_llc_test_checks.py script on the file and pre-commit the tests, just showing the changes here.

Sure. I have created new patch : https://reviews.llvm.org/D89365

The pre-commit check is not being helpful here... :-/

Seems like a useful optimisation to me though.

Can you add test for i16 and i8. As far as I understand they will not fold because we will have legalized types and the return type will not match the vector element type?

Yes!

We could think of doing this target independent instead. Folding add(vecreduce(x), vecreduce(y)) -> vecreduce(add(x, y)). It sounds generally useful to me. World that work in your case, or would it specifically need to work on UADDV?

I don't need to use UADDV specifically. But, I am not sure what impact it has on other targets (different lowerings, patterns, etc..).

mivnay updated this revision to Diff 298117.Oct 14 2020, 4:43 AM
mivnay edited the summary of this revision. (Show Details)
mivnay marked 3 inline comments as done.
dmgreen accepted this revision.Oct 14 2020, 1:30 PM

Yeah. I was going to say that this would replace 3 instructions with 2, so would generally be useful.

But thinking about it, that wouldn't actually be the case for MVE. The two reductions (VADDVA's) would likely overlap better than back to back loads.

The code here looks good to me. If we find we need to handle more cases in the future we can look at it there, and perhaps put some sort of "undo" in for MVE. But in the meantime this patch LGTM.

This revision is now accepted and ready to land.Oct 14 2020, 1:30 PM
This revision was landed with ongoing or failed builds.Oct 14 2020, 9:30 PM
This revision was automatically updated to reflect the committed changes.