Page MenuHomePhabricator

[X86][Haswell]: Updating the scheduling information for the Haswell subtarget.
ClosedPublic

Authored by gadi.haber on Nov 14 2017, 4:22 AM.

Details

Summary

Updated the scheduling information for the Haswell subtarget with the following changes:

  1. Regrouped the instructions after adding appropriate load + store latencies.
  2. Added scheduling for missing instructions such as the GATHER instrs.

The changes were made after revisiting the latencies impact of all memory uOps.

Diff Detail

Repository
rL LLVM

Event Timeline

gadi.haber created this revision.Nov 14 2017, 4:22 AM

Is there a reasonably easy way of confirming whether this makes a difference in some key benchmarks? Maybe example runs through the test suite?

Performance runs are done on 3 main benchmarks: SPEC CPU 2017, Geekbench4, EEMBC suite of automotive, denbench, coremark-pro, networking, telecom.

Performance runs are done on 3 main benchmarks: SPEC CPU 2017, Geekbench4, EEMBC suite of automotive, denbench, coremark-pro, networking, telecom.

Cool -- do you have readily available numbers you can quote in the description? Even in relative terms, just so the casual reader would have an idea why these changes are being made and what the expected effects ought to be.

I imagine also that some people might want to investigate what the effects might be on internal benchmarks, etc.

Unfortunately, I cannot give you the exact numbers.
Overall, on ~900 benchmarks the performance speedup gain of new scheduling is ~6%.

RKSimon edited edge metadata.Nov 16 2017, 1:49 AM

The gather tests (avx2-schedule.ll) still report "[1:?]"

good catch, The old scheduling has overridden the new one in the td file. I will update the diff file

Removed old scheduling for the GATHER instructions which has overridden the new ones.
Fixed the overall load latency attribute for HSW from 4 cycles to 5.

What are your thoughts on the retl/retq latencies? Some models are using the default (which includes load latency) but others have overriden it. It doesn't make much difference to codegen but I'm wondering if we need to be consistent.

Good point.
From the tables I have, I could not find any scheduling difference between retl and retq they are both mapped to:
XED_IFORM_RET_NEAR
latency = 2 cycles + 5 cycles load latency.
3 uOps
ports: 23, 0156, 6

I can add it in.

following Simon's comment to add the retl scheduling information.

@gadi.haber please can you rebase?

RKSimon accepted this revision.Dec 7 2017, 6:56 AM

LGTM, thanks

This revision is now accepted and ready to land.Dec 7 2017, 6:56 AM
This revision was automatically updated to reflect the committed changes.