Page MenuHomePhabricator

[X86][BtVer2] Improved latency and throughput of float/vector loads and stores.
ClosedPublic

Authored by andreadb on Oct 11 2019, 8:09 AM.

Details

Summary

This patch introduces the following changes to the btver2 scheduling model:

The number of micro opcodes for YMM loads and stores is now 2 (it was incorrectly set to 1 for both aligned and misaligned loads/stores).

Increased the number of AGU resource cycles for YMM loads and stores to 2cy (instead of 1cy).

Removed JFPU01 and JFPX from the list of resources consumed by pure float/vector loads (no MMX).

I verified with llvm-exegesis that pure XMM/YMM loads are no-pipe. They are dispatched to the FPU but not really issues on JFPU01.

Diff Detail

Event Timeline

andreadb created this revision.Oct 11 2019, 8:09 AM

Posted the output from llvm-exegesis for all the affected instructions.

RKSimon accepted this revision.Oct 13 2019, 4:29 AM

LGTM - thanks for looking into this

This revision is now accepted and ready to land.Oct 13 2019, 4:29 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptOct 14 2019, 4:14 AM
Herald added a subscriber: hiraditya. · View Herald Transcript