Based on
- AMD64 Architecture Programmer’s Manual Volume 6: 128-Bit and 256-Bit XOP and FMA4 Instructions,
- AMD64 Architecture Programmer’s Manual Volume 3: General-Purpose and System Instructions,
- https://en.wikipedia.org/wiki/XOP_instruction_set
Appears to be only supported in AMD's 15h generation, so only in bdver[1-4],
for which currently llvm has no scheduling profiles.
Don't duplicate, just put all the memory folds together: