This is an archive of the discontinued LLVM Phabricator instance.

[X86][Skylake] Adding the scheduling information for the SkylakeClient target
ClosedPublic

Authored by gadi.haber on Aug 30 2017, 5:58 AM.

Details

Summary

This patch adds the instruction scheduling information for the SkylakeClient (SKL) architecture target by adding the file X86SchedSkylakeClient.td located under the X86 Target.
We used the scheduling information retrieved from the Skylake architects in order to create the file.
The scheduling information includes latency, number of micro-Ops and used ports by each SKL instruction.
The patch continues the scheduling replacement and insertion effort started with the SNB target in r307529 and r310792 and for HSW in r311879.

Please expect some performance fluctuations due to code alignment effects.

Diff Detail

Repository
rL LLVM

Event Timeline

gadi.haber created this revision.Aug 30 2017, 5:58 AM
craig.topper edited edge metadata.Sep 3 2017, 9:34 AM

This seems to contain 128/256 bit AVX512VL instructions, but the 512 bit versions?

Good point. All AVX512 ISA is disabled in SKL. Will update the diff file.

gadi.haber updated this revision to Diff 113732.Sep 4 2017, 2:36 AM

Updated diff after removing all AVX512 instructions

RKSimon edited edge metadata.Sep 4 2017, 9:09 AM

Why the file mode property changes?

RKSimon added inline comments.Sep 4 2017, 10:08 AM
test/CodeGen/X86/avx2-intrinsics-x86.ll
1625 ↗(On Diff #113732)

I feels tests like this shouldn't change with schedule model tweaks - it's supposed to be a codegen test not a target specific test.

Would it be possible to reduce the use of -mcpu from tests that aren't actually testing for scheduling?

gadi.haber added inline comments.Sep 5 2017, 4:06 AM
test/CodeGen/X86/avx2-intrinsics-x86.ll
1625 ↗(On Diff #113732)

Sure. I can modify the tests. IS there a list of such tests that we can assume to tests only code gen?

RKSimon added inline comments.Sep 5 2017, 4:32 AM
test/CodeGen/X86/avx2-intrinsics-x86.ll
1625 ↗(On Diff #113732)

Most of the non *-schedule.ll ones here for instance, the vector-shift-*.ll are another set I can remember kept changing.

gadi.haber added inline comments.Sep 5 2017, 4:52 AM
test/CodeGen/X86/avx2-intrinsics-x86.ll
1625 ↗(On Diff #113732)

OK. In that case I suggest to abandon this review. I will modify the tests and commit them as NFC and the re-upload it for review after the commit.

RKSimon added inline comments.Sep 5 2017, 5:07 AM
test/CodeGen/X86/avx2-intrinsics-x86.ll
1625 ↗(On Diff #113732)

You don't need to do that - just fix the tests separately and then rebase this patch

gadi.haber added inline comments.Sep 5 2017, 6:09 AM
test/CodeGen/X86/avx2-intrinsics-x86.ll
1625 ↗(On Diff #113732)

ok.

gadi.haber added inline comments.Sep 7 2017, 9:51 PM
test/CodeGen/X86/avx2-intrinsics-x86.ll
1625 ↗(On Diff #113732)

These are the attributes I though of using instead of -mcpu=skx:
-mattr=+avx512f,+avx512bw,+avx512vl,+avx512dq

This is the attribute to use instead of -mcpu=knl:
-mattr=+avx512f

There are roughly 100 tests affected by this.

Any thoughts?

Updated after the following changes:

  1. modifying 3 Codegen tests to use the -mattr flag instead of the -mcpu flags.
  2. removing remaining AVX512 instructions for mask registers manipulations in X86SchedSkylake.td
  3. Changed the load latency from 4 cycles in HSW to 5 cycles.
  4. Better regrouping some of the instructions in X86SchedSkylake.td

Sorry to hijack this patch, but does anyone have any ideas how we can increase test coverage for the instructions we're now scheduling for? I've been slowly adding sse/avx/bmi instructions when I find the time (and I'm hoping to get fma/avx2 done soon) - those are time consuming to do but pretty trivial.

It's a lot trickier to ensure we have coverage for the various x86 core instructions, especially all the various cases of ancient x86 instructions, 8/16-bit integers, x87 etc. which are being included in these new models but not tested.

Would we be best off just adding support for instruction schedule comments to inline asm or is there a better way?

In fact, my next task after committing the SKL + SKX scheduling is to extend the MC tests to include all encodings coverage of all the instructions starting SNB.
Not sure if ALL encoding variations with all register types 8,16,32,64 will be included but at the moment I was able to gather more than 10K encodings coverage for about 5K LLVM opcodes.

As an example here are the encodings coverage I have for the MULX32rm opcode:

[0xc4,0x62,0x13,0xf6,0x2c,0x25,0xf0,0x1c,0xf0,0x1c] mulx 485498096, %r13d, %r13d
[0xc4,0x62,0x13,0xf6,0x2a] mulx (%rdx), %r13d, %r13d
[0xc4,0x62,0x13,0xf6,0x6a,0x40] mulx 64(%rdx), %r13d, %r13d
[0xc4,0x62,0x13,0xf6,0x6c,0x02,0x40] mulx 64(%rdx,%rax), %r13d, %r13d
[0xc4,0x62,0x13,0xf6,0x6c,0x82,0x40] mulx 64(%rdx,%rax,4), %r13d, %r13d
[0xc4,0x62,0x13,0xf6,0x6c,0x82,0xc0] mulx -64(%rdx,%rax,4), %r13d, %r13d

I think I've added all the AVX2 instructions to avx2-schedule.ll now, so this patch needs to be updated.

I also noticed that Haswell/Skylake are both missing costs for the gather instructions

Yes I'm working to add the GATHER instructions.

Updated diff after rebase codegen tests + added VGATHER schedulings.

Updated diff after rebase codegen tests + added VGATHER schedulings.

The gather scheduling values seem to have disappeared completely from avx2-scheule.ll ?

Updated diff after fixing scheduling information was missing from avx2-schedule.ll test due to a typo bug in the X86SchedSkylakeCleint.td

good catch on the missing scheduling for gather instructions!

corrected updated diff to use git diff HEAD^

Please note that apart from this scheduling patch we have only one more large patch left - the SkylakeServer schedulings, which I like to submit for review as soon as this one is committed.

Please note that apart from this scheduling patch we have only one more large patch left - the SkylakeServer schedulings, which I like to submit for review as soon as this one is committed.

For that we need decent scheduling test coverage of avx512 instructions, which if possible should to be done before the skx model is pushed for review to prevent all the missed cases we've had in the past.

Yes. Make sense. I will start with extending the avx512 scheduling test coverage.

If you do not see standing issues, please accept the review so we can focus on SKylakeServer. Thanx!

RKSimon accepted this revision.Sep 18 2017, 2:46 AM

LGTM

This revision is now accepted and ready to land.Sep 18 2017, 2:46 AM
This revision was automatically updated to reflect the committed changes.
lib/Target/X86/X86.td