[X86][BtVer2] Improved latency and throughput of float/vector loads and stores.
This patch introduces the following changes to the btver2 scheduling model:
- The number of micro opcodes for YMM loads and stores is now 2 (it was incorrectly set to 1 for both aligned and misaligned loads/stores).
- Increased the number of AGU resource cycles for YMM loads and stores to 2cy (instead of 1cy).
- Removed JFPU01 and JFPX from the list of resources consumed by pure float/vector loads (no MMX).
I verified with llvm-exegesis that pure XMM/YMM loads are no-pipe. Those
are dispatched to the FPU but not really issues on JFPU01.
Differential Revision: https://reviews.llvm.org/D68871