TLDR: Currently, llvm-mca makes a very conservative assumption about which instructions are and aren't memory-barrier instructions. This leads to quite a few false positives which can result in inaccurate simulations. With this patch, I am proposing that we get rid of the assumptions and instead give the targets and developers a convenient way to specify exactly which instructions should be treated as LoadBarriers and/or StoreBarriers.
First off, I'd really like to apologize for how long this patch has taken me to post. Andrea and I discussed this in late August and I had almost all of it completed by early September. Due to a mixture of personal and technical issues, I was unable to finalize it until now. As far as I can tell, the patch is still relevant, but let me know if not.
As mentioned in the TLDR, this patch aims to be more explicit about which instructions are memory-barriers. Currently, the LSUnit uses the MayLoad, MayStore, and HasUnmodeledSideEffects flags to decide which instructions should behave as barriers. This is a conservative assumption and leads to some instructions causing stalls in the pipeline that shouldn't exist.
What I've done is add two flags to the InstructionBase class (IsALoadBarrier and IsAStoreBarrier). These flags can be modified and evaluated with the isAStoreBarrier(), isALoadBarrier(), setStoreBarrier(), and setLoadBarrier() methods from the same class. In my mind, it is most natural to set these flags within a target's InstrPostProcess::postProcessInstruction() method. An example of this is included in the patch within the newly added X86InstrPostProcess class (within the X86CustomBehaviour.cpp file).
I also added a new command-line argument to llvm-mca (--show-barriers) which adds two additional columns to the InstructionInfoView to show which instructions are load and/or store barriers. This argument defaults to false so the default behaviour of this view is unchanged.
I expected this patch to be quite disruptive to the mca test cases, but I only had to update 10 of the test cases (and then I added a new test case that shows the --show-barriers argument in action). I would really appreciate if some people could look through how those test cases were altered to make sure that they haven't been made inaccurate (or at least not more inaccurate). Feel free to add more reviewers if you'd like target specific developers to take a look at the updated tests.
Each of those 10 test cases have 1 or more instructions that the LSUnit used to conservatively assume were load and/or store barriers. I've listed those instructions below. For some of these test cases, this patch affected their resource pressure statistics. This is not something that I am very familiar with so I'd also really appreciate if someone could make sure that these changes to the resource pressure stats aren't moving in the wrong direction.
AArch64/Cortex/A55-load-store-noalias.s:
nop
AMDGPU/gfx9-retireooo.s:
flat_load_dword (This is the test case that motivated this patch so I'm fairly confident this one shouldn't be treated as a barrier.)
X86/Barcelona/store-throughput.s :
movd
X86/BdVer2/load-store-throughput.s:
movd
X86/BdVer2/pr37790.s:
int3 and stmxcsr
X86/BdVer2/store-throughput.s:
movd
X86/BtVer2/pr37790.s:
int3 and stmxcsr
X86/BtVer2/stmxcsr-ldmxcsr.s:
stmxcsr and ldmxcsr
X86/Haswell/reserved-resources.s:
fxrstor
X86/Haswell/stmxcsr-ldmxcsr.s:
stmxcsr and ldmxcsr
If any of these instructions should be load and/or store barriers, let me know and I can update this patch to set the appropriate flag(s) for those instructions.
Thank you for your time and thanks in advance for any of your input, suggestions, criticisms, or questions. And sorry again for having this patch take so long.
You also need to test SFENCE here.
You can also get rid of most instructions here. Personally, I would only leave SFENCE MFENCE LFENCE and any other instructions with "unmodeled side effects". So, CLFLUSH, LFENCE, MFENCE and SFENCE are fine. All other SSE instructions can be removed in my opinion