Page MenuHomePhabricator

[X86] Add zero idioms to the haswell, broadwell, and skylake schedule models. Add 256-bit fp xor to sandybridge zero idioms

Authored by craig.topper on May 23 2019, 5:27 PM.



This copies the Sandy Bridge zero idiom support to later CPUs. Adding the AVX2 and AVX512F/VL instructions as appropriate.

Diff Detail


Event Timeline

craig.topper created this revision.May 23 2019, 5:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2019, 5:27 PM
craig.topper marked an inline comment as done.May 23 2019, 5:28 PM
craig.topper added inline comments.
1163 ↗(On Diff #201105)

I didn't like the reference to ResGroup30 since someone could easily rearrange numbers and not know this was dependent. So I copied the class and gave it a better name.

courbet accepted this revision.May 24 2019, 2:35 AM


1163 ↗(On Diff #201105)

Makes sense.

This revision is now accepted and ready to land.May 24 2019, 2:35 AM

pre-commit the zero idiom tests?

1606 ↗(On Diff #201105)

Haswell and Broadwell?

1859 ↗(On Diff #201105)

Haswell and Broadwell?

1747 ↗(On Diff #201105)


2463 ↗(On Diff #201105)


andreadb accepted this revision.May 24 2019, 4:05 AM

LGTM (modulo the changes requested by Simon).

Thanks Craig!

About PR41982:

It would be really nice if we could teach llvm-mca how to identify dependency-breaking idioms.
That could be done in a follow-up patch by adding some extra tablegen definitions to those scheduling models.
For example, on BtVer2 we have something like this:

def : IsZeroIdiomFunction<[
  // GPR Zero-idioms.
  DepBreakingClass<[ SUB32rr, SUB64rr, XOR32rr, XOR64rr ], ZeroIdiomPredicate>

In general, we can propagate information about dependency-breaking instructions to llvm-mca by instantiating tablegen class IsZeroIdiomFunction and IsDepBreakingFunction.
llvm-mca uses that information to accurately identify register dependencies.

Obviously, this is entirely optional; at the moment, only llvm-mca uses those definitions. Personally, I am already very happy to see this patch.
But I think that at some point we should consider adding those extra definitions in order to fully address PR41982.


This revision was automatically updated to reflect the committed changes.