This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Disable shouldFoldConstantShiftPairToMask for scalar shifts on AMD targets (PR40758)
ClosedPublic

Authored by RKSimon on May 11 2019, 1:18 PM.

Details

Summary

D61068 handled vector shifts, this patch does the same for scalars where there are similar number of pipes for shifts as bit ops.

This is true almost entirely for AMD targets where the scalar ALUs are well balanced

This combine avoids AND immediate mask which usually means we reduce encoding size.

Some tests show use of (slow, scaled) LEA instead of SHL in some cases, but thats due to particular shift immediates - shift+mask generate these just as easily.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon created this revision.May 11 2019, 1:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2019, 1:18 PM
RKSimon updated this revision to Diff 199158.May 11 2019, 1:30 PM
RKSimon edited the summary of this revision. (Show Details)

rebase

lebedev.ri added inline comments.May 13 2019, 2:12 PM
lib/Target/X86/X86.td
430 ↗(On Diff #199158)

This is talking about 'scalar', but the hook is '..Constant..'.
This target feature is not specifically about fast shifts by immediate?

RKSimon marked an inline comment as done.May 14 2019, 1:49 AM
RKSimon added inline comments.
lib/Target/X86/X86.td
430 ↗(On Diff #199158)

The scalar/vector feature flags indicate that shifts have a similar throughput to ANDs

lebedev.ri accepted this revision.May 14 2019, 2:18 AM

LG
I'm not sure why [SSE] tag is in the subject, should it be [AMD]?

Also, do we want to unfold and (shift x, c1), c2 / shift (and x, c1), c2 ?
Not sure, under !TLI.shouldFoldConstantShiftPairToMask() or a new shouldFoldMaskToConstantShiftPair().

This revision is now accepted and ready to land.May 14 2019, 2:18 AM
This revision was automatically updated to reflect the committed changes.