Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Avoid migrating existing patches. Phabricator shutdown timeline

Optimize shift and accumulate pattern in AArch64.
ClosedPublic

Authored by adriantong1024 on Nov 22 2021, 4:48 PM.

Details

Summary

AArch64 supports unsigned shift right and accumulate. In case we see a
unsigned shift right followed by an OR. We could turn them into a USRA
instruction, given the operands of the OR has no common bits.

Diff Detail

Event Timeline

adriantong1024 created this revision.Nov 22 2021, 4:48 PM
adriantong1024 requested review of this revision.Nov 22 2021, 4:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 22 2021, 4:48 PM

Could this be handled with a tablegen pattern? Something like the or_is_add pattern from X86 (but using haveNoCommonBitsSet). Maybe a PatFrag that accepts an add or an add-like-or?

Also should ssra be treated the same way?

llvm/test/CodeGen/AArch64/shift-accumulate.ll
13

It's probably a better test to return the %5, not extract a single lane from it.

Could this be handled with a tablegen pattern? Something like the or_is_add pattern from X86 (but using haveNoCommonBitsSet). Maybe a PatFrag that accepts an add or an add-like-or?

Thanks for the comment David. I will take a look at handling in tablegen pattern as well. Will update when I have more.

Also should ssra be treated the same way?

Address Dave's comment. Moving the rules into tablegen.

Thanks. I like the added Known bits - they can be useful in many situations.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1844

We know that the operands 1 and 2 will be constants, so I think we can just grab the values for them directly:

uint64_t Mask = ~(Op->getConstantOperandVal(1) << Op->getConstantOperandVal(2));

The Mask then specifies which bits are known to be 0.

llvm/lib/Target/AArch64/AArch64InstrInfo.td
8366

Now that we have this, it may be worth using it in the existing patterns, using a PatFrags that accepts either "add" or "or_is_add". That would save the need for new patterns.

llvm/test/CodeGen/AArch64/shift-accumulate.ll
2

Can you run the update_llc_test_checks on the file? It's missing some expected output.

It's also worth adding some simple case for each signedness / type that have tablegen patterns added / changed.

19

Unfortunately this doesn't verify. I think because the BIC code is giving incorrect Known bits.

Address Dave's comment.

Add 2 more test cases, testing different datatypes.

Thanks. Looks good, as far as I can tell.

Can you make sure there are some i64 tests, especially v1i64. Those have a different tablegen class, so it would be good to make sure they have tests covering them.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1836–1840

Is this chunk needed? I think the operands should be constants by definition.

llvm/test/CodeGen/AArch64/shift-accumulate.ll
3

I tend to use -mtriple=aarch64-none-eabi

5

You can usually remove dso_local and local_unnamed_addr #0 align 32, to make the tests a little cleaner.

Address Dave's comment. Thanks!

dmgreen added inline comments.Jan 17 2022, 2:16 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1836–1838

We don't appear to use Known2 and Known3

llvm/test/CodeGen/AArch64/shift-accumulate.ll
31

Can you add the <1 x i64> version of this test (and ssra_v2i64 too). It should then test the "scalar" instructions with "d0" and "d1" operands.
You can remove dso_local too.

dmgreen accepted this revision.Jan 17 2022, 2:40 AM

Other than the things I mentioned, this LGTM

This revision is now accepted and ready to land.Jan 17 2022, 2:40 AM

Address Dave's comments. Thanks!

dmgreen accepted this revision.Jan 18 2022, 5:19 AM

Thanks!

This revision was landed with ongoing or failed builds.Jan 19 2022, 5:58 PM
This revision was automatically updated to reflect the committed changes.