Page MenuHomePhabricator

[X86][F16C] Update instruction scheduling on btver2

Authored by avt77 on Oct 18 2017, 6:10 AM.



This patch includes proper schedule numbers for F16C instructions on btver2 CPU.

Diff Detail


Event Timeline

avt77 created this revision.Oct 18 2017, 6:10 AM
RKSimon added inline comments.Oct 18 2017, 7:25 AM
395 ↗(On Diff #119471)

I think 'ResourceCycles = [1];' is the default so you should be able to drop this?

403 ↗(On Diff #119471)

Split off the cvtph2ps instructions from cvtps2ph - the load/store cases are definitely different and it makes little sense to keep the rr cases together.

405 ↗(On Diff #119471)

Shouldn't the JFPU1 case be JFPU01? The amd docs say 'STC,FPA|FPM'

412 ↗(On Diff #119471)

Shouldn't the JFPU1 case be JFPU01?

426 ↗(On Diff #119471)

cvtph2ps is a load, so the JLAGU is the first stage not the last.

48 ↗(On Diff #119471)

This should be [8:1.00]

106 ↗(On Diff #119471)

These should be [8:2.00] and [3:2.00]?

avt77 retitled this revision from F16C inructions scheduling on btver2 to [X86][F16C] Update instruction scheduling on btver2.Oct 18 2017, 7:28 AM
avt77 added inline comments.Oct 19 2017, 1:20 AM
405 ↗(On Diff #119471)

It's again a difference between Agner and AMD docs :-(

avt77 updated this revision to Diff 119565.Oct 19 2017, 4:31 AM

I fixed issues raised by Simon

RKSimon edited edge metadata.Oct 20 2017, 9:37 AM

These latencies/throughputs still don't match the AMD docs - please match those and not the Agner tests

404 ↗(On Diff #119565)

VCVTPH2PSrm is a load not a store it can't use WriteCVT3St

421 ↗(On Diff #119565)


428 ↗(On Diff #119565)

3 + 5 = 8

avt77 updated this revision to Diff 119836.Oct 23 2017, 5:17 AM

All numbers are from AMD docs now.

Rebase this?

avt77 updated this revision to Diff 120000.Oct 23 2017, 11:58 PM

I rebased f16c-schedule.ll.

I'm not shure about WriteCVT3St lattency: is it really 3?

RKSimon accepted this revision.Oct 24 2017, 2:50 AM

I'm not shure about WriteCVT3St lattency: is it really 3?

Stores are tricky as we can't easily model the time until the value is written out to memory - best we can do is just the cycles for the conversion, and it then disappears into the memory queue. It means that spill/reload round trips can't be easily modelled but then we don't handle STLF timings either.

LGTM with one minor.

438 ↗(On Diff #120000)

This should be Latency = 8

This revision is now accepted and ready to land.Oct 24 2017, 2:50 AM
This revision was automatically updated to reflect the committed changes.