Page MenuHomePhabricator

[InstCombine] matchRotate - support (uniform) constant rotation amounts (PR46895)
ClosedPublic

Authored by RKSimon on Sep 10 2020, 6:45 AM.

Details

Summary

This patch adds handling of rotation patterns with constant shift amounts - the next bit will be how we want to support non-uniform constant vectors.

Diff Detail

Event Timeline

RKSimon created this revision.Sep 10 2020, 6:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 10 2020, 6:45 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
RKSimon requested review of this revision.Sep 10 2020, 6:45 AM

any comments on this?

RKSimon updated this revision to Diff 291597.Sep 14 2020, 9:51 AM

fix clang-tidy warnings

nikic added inline comments.Sep 14 2020, 1:29 PM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2115

Probably need to either guard or assert against invalid shift amounts here. Don't want to combine -33 and +1.

llvm/lib/Transforms/Utils/Local.cpp
2910 ↗(On Diff #291597)

mount -> amount

2922 ↗(On Diff #291597)

Don't we need to check that the fsh operands are the same, or at least LHS->Provider and RHS->Provider are?

2924 ↗(On Diff #291597)

I suspect this is inverted, but maybe I misunderstand the direction of the mapping. It would be good to at least add some tests where the fsh amount is not exactly half the bitwidth.

Result = fshl(LHS, RHS, 1)
Result[0] = RHS[31]
Result[1] = LHS[0]
...
Result[31] = LHS[30]
RKSimon updated this revision to Diff 292490.Sep 17 2020, 7:19 AM

Fix the -ve shift amount issue - still looking at better testing of fshl/fshr inside collectBitParts

RKSimon planned changes to this revision.Sep 17 2020, 7:19 AM

@nikic I've pulled (and fixed) the Local.cpp changes out to D88292 for review first.

RKSimon updated this revision to Diff 294405.Fri, Sep 25, 12:45 PM
RKSimon edited the summary of this revision. (Show Details)

rebase now that D88292 has landed

RKSimon edited the summary of this revision. (Show Details)Fri, Sep 25, 12:45 PM
nikic accepted this revision.Fri, Sep 25, 12:57 PM

LGTM

This revision is now accepted and ready to land.Fri, Sep 25, 12:57 PM
This revision was landed with ongoing or failed builds.Fri, Sep 25, 2:06 PM
This revision was automatically updated to reflect the committed changes.
nikic added a comment.Fri, Sep 25, 3:12 PM

There's a 40% code size increase on CMakeFiles/7zip-benchmark.dir/CPP/7zip/Crypto/Sha1.cpp.o, might be worth double checking that we're not missing some optimizations on funnel shifts.

RKSimon added a comment.EditedSat, Sep 26, 3:41 AM

There's a 40% code size increase on CMakeFiles/7zip-benchmark.dir/CPP/7zip/Crypto/Sha1.cpp.o, might be worth double checking that we're not missing some optimizations on funnel shifts.

Looking at this now: https://wide.godbolt.org/z/h9GoTo

There's a 40% code size increase on CMakeFiles/7zip-benchmark.dir/CPP/7zip/Crypto/Sha1.cpp.o, might be worth double checking that we're not missing some optimizations on funnel shifts.

Looking at this now: https://wide.godbolt.org/z/h9GoTo

I think I've worked out whats happened - we've replaced a 3 x instruction pattern (which was later matched to a single rotate instruction in the DAG) with 1 intrinsic (which matched to the same single rotate instruction) - which seems to have allowed a couple of core loops to be unrolled further/completely - leading to the code size increase.