This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Eliminating the use of integer unit in moving from a Neon scalar result of a uaddlv to a Neon vector
ClosedPublic

Authored by nilanjana_basu on Jan 25 2023, 6:12 PM.

Diff Detail

Event Timeline

nilanjana_basu created this revision.Jan 25 2023, 6:12 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 25 2023, 6:12 PM

Added all the variants of uaddlv, added 9 more unit tests & reduced the code for the pattern match.

Added more patterns to match different types of vector insertion for the same set of 'uaddlv' functions

nilanjana_basu published this revision for review.Jan 31 2023, 6:37 PM

The code generated for inserting the scalar result of a 'uaddlv' intrinsic function to a vector exhibits a redundant move to the integer register first, & then moves it back to the destination vector Neon register. This can be done directly. In my understanding, this issue occurs because the selected instruction for vector insertion expects an i32/i64 integer register to hold the value and the selected instruction for the 'uaddlv' function is giving an output of i32/i64 at the instruction selection time, whereas in practice the result of 'uaddlv' goes to a Neon register. Therefore I added patterns to match the combined use of these two instructions (uaddlv & vector_insert), to generate the direct move from the source to destination Neon register.

I am seeking feedback on whether there exists a better/more generic way of doing this, since currently we are matching patterns case by case. Even though I tried to be exhaustive, it's easy to miss a pattern. Please let me know your thoughts on this.

Herald added a project: Restricted Project. · View Herald TranscriptJan 31 2023, 6:37 PM

Tablegen patterns are always an option - there is no problem with adding those. Like you say the lowering of ADDV reductions is a little odd in places. If you wanted to fix this very generically it could be written as a post-isel peephole optimization, perhaps in AArch64MIPeepholeOpt. I've not looked deeply into the issues, but It could transform fmov w8, s1; mov.s v2[0], w8 to an lane move instruction, which replaces two fairly slow instructions with a fast one. Happening later it might help in more cases than just the reductions.

The other option is to do some lowering earlier: split the intrinsic into two operations (the reduction, and the vector extract), and select them separately. We already have AArch64ISD opcodes for other reductions; see combineAcrossLanesIntrinsic in AArch64ISelLowering.cpp. This could unlock additional optimizations. (For example, we could combine the insert with another shuffle.)

Thank you everyone for the feedback. I am currently working on the post-isel peephole optimization since that can capture the more generic pattern of

%y:gpr = COPY %x:fpr
%z:fpr = INSvi..gpr .., a, %y:gpr

& replace it with
%z:fpr = INSvi..lane .., a, %x:fpr, 0

which should apply to a larger class of code than vector insertion of the uaddlv intrinsic result.
I'll update the patch & problem description once I have a fully working solution.

Moved the optimization from ISel phase to Post-ISel peephole optimization

Thanks - this looks like it should be pretty generic.

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
561–565

Will these ever not be true?

577

Please clang-format. You can also drop the brackets from single-statement if's.
Is it worth checking that the subreg indices are as-expected? I'm not sure they can actually be incorrect.

595

It may not be Killed if it has other uses.

600

If this is removing instructions, does it need to check that the COPYs have 1 use?
Could it just remove MI and let the others be removed naturally if they are no longer used?

llvm/test/CodeGen/AArch64/fptosi-sat-vector.ll
3505

Are you using llvm/utils/update_llc_test_checks.py on the file? I'm not sure the indenting should change.

llvm/test/CodeGen/AArch64/peephole-insvigpr.mir
112

Some of this can often be removed to make the tests a little simple.

fhahn added inline comments.Feb 13 2023, 6:46 AM
llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
595

If this is not covered by the existing tests, could you add one that covers this case?

llvm/test/CodeGen/AArch64/peephole-insvigpr.mir
25

you should be able to keep the function here to a minimum, i.e.

define void @insert_vec_v6i64_uaddlv_from_v4i32() {
  ret void
 }

Then you have to remove the references to LLVM IR in the MIR itself, .e.g.

bb.0.entry: -> bb.0:`

STRDui killed %17, %0, 2 :: (store (s64) into %ir.0 + 16) -> STRDui killed %17, %0, 2 :: (store (s64))

86

I think most of those should be removable. Also would be good to add the new test separately.

92

you probably are also able to remove this.

nilanjana_basu marked 4 inline comments as done.

Addressed most comments of reviewers - removed kill flag on source register, deleted removal of dangling COPY instructions, removed unnecessary checks, updated test scripts.

Removing redundant modification in a file

nilanjana_basu marked 5 inline comments as done.Feb 20 2023, 12:48 AM

Addressed the reviewer comments.

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
561–565

I haven't found a case where it does. Removed it in the latest patch.

577

Please clang-format. You can also drop the brackets from single-statement if's.

Done

Is it worth checking that the subreg indices are as-expected? I'm not sure they can actually be incorrect.

The check for the source register being virtual and of type FP128 is needed. I removed the rest.

595

It may not be Killed if it has other uses.

I replaced the 'kill' flag with the original flag of the source register.

600

Deleted the removal of dangling COPY instructions to allow them to be handled naturally.

llvm/test/CodeGen/AArch64/fptosi-sat-vector.ll
3505

Thank you for mentioning this - wasn't aware of this tool before. Updated the files.

nilanjana_basu marked an inline comment as done.

Removed redundant lines from MIR test file

dmgreen accepted this revision.Feb 20 2023, 6:29 AM

Thanks. From what I can see this LGTM.

The Peephole optimizations have a habit of causing subtle problems. Please make sure you run a bootstrap at least to see if any problems arise.

This revision is now accepted and ready to land.Feb 20 2023, 6:29 AM

Merged with latest codebase & updated the tests

Thanks. From what I can see this LGTM.

The Peephole optimizations have a habit of causing subtle problems. Please make sure you run a bootstrap at least to see if any problems arise.

Thank you for the review. I ran all unit tests, llvm-lit tests with llvm-test-suite & a bootstrap build test.

This revision was landed with ongoing or failed builds.Feb 27 2023, 12:25 PM
This revision was automatically updated to reflect the committed changes.