This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add post-isel pseudos for rotate by immediate using SHLD/SHRD
ClosedPublic

Authored by craig.topper on Mar 14 2019, 3:10 PM.

Details

Summary

Haswell CPUs have special support for SHLD/SHRD with the same register for both sources. Such an instruction will go to the rotate/shift unit on port 0 or 6. This gives it 1 cycle latency and 0.5 cycle reciprocal throughput. When the register is not the same, it becomes a 3 cycle operation on port 1. Sandybridge and Ivybridge always have 1 cyc latency and 0.5 cycle reciprocal throughput for any SHLD.

When FastSHLDRotate feature flag is set, we try to use SHLD for rotate by immediate unless BMI2 is enabled. But MachineCopyPropagation can look through a copy and change one of the sources to be different. This will break the hardware optimization.

This patch adds psuedo instruction to hide the second source input until after register allocation and MachineCopyPropagation. I'm not sure if this is the best way to do this or if there's some other way we can make this work.

Fixes PR41055

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Mar 14 2019, 3:10 PM
RKSimon added a subscriber: andreadb.

I've been toying with the idea of using a pseudo for all SHLD/SHRD cases so we can make it easier to select between that and the expanded shift pattern depending on scheduler-model/register-pressure etc. instead of trying to make the decision in DAG with the feature bits. I don't know if this could be a first step towards this? @andreadb Any thoughts?

I've been toying with the idea of using a pseudo for all SHLD/SHRD cases so we can make it easier to select between that and the expanded shift pattern depending on scheduler-model/register-pressure etc. instead of trying to make the decision in DAG with the feature bits. I don't know if this could be a first step towards this? @andreadb Any thoughts?

I've been toying with the idea of using a pseudo for all SHLD/SHRD cases so we can make it easier to select between that and the expanded shift pattern depending on scheduler-model/register-pressure etc. instead of trying to make the decision in DAG with the feature bits. I don't know if this could be a first step towards this? @andreadb Any thoughts?

I agree that this could definitely be a first step towards implementing scheduler-driven peephole optimizations.

That being said, this is easier said than done. Coming up with a good (and generic) framework and a decent cost model for doing this is not simple at all... But it is definitely an interesting project: we could start with a simple prototype pass which is then evolve over time. Just an idea.

To start, it would be nice to have a pass that checks if it is profitable to convert a SHLD/SHRD based on the feedback of a simple cost model which internally takes into account potential increases/decreases in register pressure as well as potential perf gains. Something similar in principle to what we already do in passes like TwoAddressInstruction, where heuristics are used to identify cases where it is profitabile to commute/convert instructions (for example: based for example on the number of users of a definition/intervening instructions/etc.).

Ping. Is this ok by itself?

RKSimon accepted this revision.Mar 27 2019, 8:51 AM

Ping. Is this ok by itself?

LGTM - sorry for sidetracking this ticket!

This revision is now accepted and ready to land.Mar 27 2019, 8:51 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMar 27 2019, 10:30 AM