This is an archive of the discontinued LLVM Phabricator instance.

[X86] Alternative algorithm for vector-vector shifts
Needs ReviewPublic

Authored by Nekotekina on Mar 18 2023, 4:25 AM.

Details

Summary

Uses trivial in-lane instructions to avoid expensive cross-lane shuffles. Sometimes prevents using higher YMM/ZMM registers.

Diff Detail

Event Timeline

Nekotekina created this revision.Mar 18 2023, 4:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 18 2023, 4:25 AM
Nekotekina requested review of this revision.Mar 18 2023, 4:25 AM
n-omer added a subscriber: n-omer.Apr 4 2023, 6:13 AM

@Nekotekina are you still looking at this?

@RKSimon Yes, I'd like to get a review

RKSimon edited reviewers, added: RKSimon, pengfei, LuoYuanke; removed: Restricted Project.Apr 23 2023, 4:37 AM

It might be better to split the rotate/funnel and general shift lowering into 2 separate patches.

llvm/lib/Target/X86/X86ISelLowering.cpp
31023

Describe the codegen?

31028

This logic should be something like if !supportedVectorVarShift(VT) && supportedVectorVarShift(WideVT) ?

31041

Lo/Hi might be better names?

31044

getTargetVShiftByConstNode ?

31046

Don't you check for !DAG.isSplatValue(Amt) in the opening if() ?

31054

LowerShift has a unsigned Opc we can use

31077

Do we need the ternlog path? Don't existing combines deal with it later (canonicalizeBitSelect etc.)?

31454

It'd be better if you can adjust the logic below instead of having a second copy.

31667

See below what? Add a new self-contained comment instead of relying on a code layout.

31680

clang-format (indentation)

31693

Same for funnel shifts - by tweaking this logic we might be able to avoid the duplicate code.

LuoYuanke added inline comments.Apr 23 2023, 5:44 AM
llvm/test/CodeGen/X86/avx2-shift.ll
547

The instruction count seems to increase from 5 to 7. Could you check if it improves performance with "https://uica.uops.info" or llvm-mca?

@RKSimon Thanks for the review, I created another diff with only LowerRotate changes: https://reviews.llvm.org/D149071

@LuoYuanke Added some analysis with https://uica.uops.info

llvm/test/CodeGen/X86/avx2-shift.ll
547

Yes, it looks like an improvement unless I interpret website results wrong:
Left: https://bit.ly/3Vglo5l
Right: https://bit.ly/41ZSmZP

575
601