This is an archive of the discontinued LLVM Phabricator instance.

Optimize shift and accumulate pattern in AArch64.
ClosedPublic

Authored by adriantong1024 on Nov 22 2021, 4:48 PM.

Details

Summary

AArch64 supports unsigned shift right and accumulate. In case we see a
unsigned shift right followed by an OR. We could turn them into a USRA
instruction, given the operands of the OR has no common bits.

Diff Detail

Event Timeline

adriantong1024 created this revision.Nov 22 2021, 4:48 PM
adriantong1024 requested review of this revision.Nov 22 2021, 4:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 22 2021, 4:48 PM

Could this be handled with a tablegen pattern? Something like the or_is_add pattern from X86 (but using haveNoCommonBitsSet). Maybe a PatFrag that accepts an add or an add-like-or?

Also should ssra be treated the same way?

llvm/test/CodeGen/AArch64/shift-accumulate.ll
13

It's probably a better test to return the %5, not extract a single lane from it.

Could this be handled with a tablegen pattern? Something like the or_is_add pattern from X86 (but using haveNoCommonBitsSet). Maybe a PatFrag that accepts an add or an add-like-or?

Thanks for the comment David. I will take a look at handling in tablegen pattern as well. Will update when I have more.

Also should ssra be treated the same way?

Address Dave's comment. Moving the rules into tablegen.

Thanks. I like the added Known bits - they can be useful in many situations.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1844

We know that the operands 1 and 2 will be constants, so I think we can just grab the values for them directly:

uint64_t Mask = ~(Op->getConstantOperandVal(1) << Op->getConstantOperandVal(2));

The Mask then specifies which bits are known to be 0.

llvm/lib/Target/AArch64/AArch64InstrInfo.td
8366

Now that we have this, it may be worth using it in the existing patterns, using a PatFrags that accepts either "add" or "or_is_add". That would save the need for new patterns.

llvm/test/CodeGen/AArch64/shift-accumulate.ll
2

Can you run the update_llc_test_checks on the file? It's missing some expected output.

It's also worth adding some simple case for each signedness / type that have tablegen patterns added / changed.

19

Unfortunately this doesn't verify. I think because the BIC code is giving incorrect Known bits.

Address Dave's comment.

Add 2 more test cases, testing different datatypes.

Thanks. Looks good, as far as I can tell.

Can you make sure there are some i64 tests, especially v1i64. Those have a different tablegen class, so it would be good to make sure they have tests covering them.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1836–1840

Is this chunk needed? I think the operands should be constants by definition.

llvm/test/CodeGen/AArch64/shift-accumulate.ll
3

I tend to use -mtriple=aarch64-none-eabi

5

You can usually remove dso_local and local_unnamed_addr #0 align 32, to make the tests a little cleaner.

Address Dave's comment. Thanks!

dmgreen added inline comments.Jan 17 2022, 2:16 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1836–1838

We don't appear to use Known2 and Known3

llvm/test/CodeGen/AArch64/shift-accumulate.ll
31

Can you add the <1 x i64> version of this test (and ssra_v2i64 too). It should then test the "scalar" instructions with "d0" and "d1" operands.
You can remove dso_local too.

dmgreen accepted this revision.Jan 17 2022, 2:40 AM

Other than the things I mentioned, this LGTM

This revision is now accepted and ready to land.Jan 17 2022, 2:40 AM

Address Dave's comments. Thanks!

dmgreen accepted this revision.Jan 18 2022, 5:19 AM

Thanks!

This revision was landed with ongoing or failed builds.Jan 19 2022, 5:58 PM
This revision was automatically updated to reflect the committed changes.