This is an archive of the discontinued LLVM Phabricator instance.

[X86] LowerRotate: prefer unpack-based algorithm
ClosedPublic

Authored by Nekotekina on Apr 24 2023, 8:44 AM.

Details

Summary

Splitting and improving from the https://reviews.llvm.org/D146357

When running tests for LowerShift, I discovered some poor codegen in rotate and funnel shift tests. This patch attempts to address some of them.

Using unpack for splitting and using double-bitwidth shifts may improve performance according to https://uica.uops.info tests.

  1. No cross-lane shuffles
  2. No dirtying double-width registers
  3. Massive improvement for AVX2 rotates in some cases (var_funnnel_v8i16, var_funnnel_v16i16) — because unpack is currently only used for vXi8 vectors.

Diff Detail

Event Timeline

Nekotekina created this revision.Apr 24 2023, 8:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 24 2023, 8:44 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
Nekotekina requested review of this revision.Apr 24 2023, 8:44 AM

Added some code analyser results.

llvm/test/CodeGen/X86/min-legal-vector-width.ll
1942

Code analyser:
SKX version (left): https://bit.ly/3NamuO5
VBMI version (left): https://bit.ly/444ZUfB

Common (right): https://bit.ly/3n7q4xQ
Unpack version looks better despite more instructions.

llvm/test/CodeGen/X86/vector-fshl-rot-128.ll
379

Even less instructions than on the left

1549

LGTM

@pengfei @LuoYuanke Are you happy with the performance vs code size trade off?

goldstein.w.n added inline comments.
llvm/test/CodeGen/X86/min-legal-vector-width.ll
1942

This seems potentially not worth it for ICX unless its in a tight inner loop.

LGTM

@pengfei @LuoYuanke Are you happy with the performance vs code size trade off?

Sure. Good to see it improves the performance. Thanks, @Nekotekina and @RKSimon.

The summary may need to describe the purpose or functionality for this patch.

Nekotekina edited the summary of this revision. (Show Details)Apr 28 2023, 1:37 AM
RKSimon accepted this revision.Apr 28 2023, 8:22 AM

LGTM

This revision is now accepted and ready to land.Apr 28 2023, 8:22 AM

@Nekotekina Please can you create the LowerFunnelShift patch next?

@RKSimon Yes, I was going to do so.

@RKSimon Is there something else I could do?

@RKSimon Is there something else I could do?

(sorry I was at EuroLLVM and I am slowly catching up) - do you have commit access? If not, please can you post your email address and I'll commit on your behalf

@RKSimon I see, my email is nekotekina@gmail.com

This revision was landed with ongoing or failed builds.May 15 2023, 3:26 AM
This revision was automatically updated to reflect the committed changes.