HomePhabricator

[X86][BtVer2] Improved latency and throughput of float/vector loads and stores.

Description

[X86][BtVer2] Improved latency and throughput of float/vector loads and stores.

This patch introduces the following changes to the btver2 scheduling model:

  • The number of micro opcodes for YMM loads and stores is now 2 (it was incorrectly set to 1 for both aligned and misaligned loads/stores).
  • Increased the number of AGU resource cycles for YMM loads and stores to 2cy (instead of 1cy).
  • Removed JFPU01 and JFPX from the list of resources consumed by pure float/vector loads (no MMX).

I verified with llvm-exegesis that pure XMM/YMM loads are no-pipe. Those
are dispatched to the FPU but not really issues on JFPU01.

Differential Revision: https://reviews.llvm.org/D68871

Details