This is an archive of the discontinued LLVM Phabricator instance.

[llvm-mca] [LSUnit] Proposal for declaring memory-barrier instructions explicitly rather than making conservative assumptions.
ClosedPublic

Authored by holland11 on Jan 6 2022, 5:33 PM.

Details

Summary

TLDR: Currently, llvm-mca makes a very conservative assumption about which instructions are and aren't memory-barrier instructions. This leads to quite a few false positives which can result in inaccurate simulations. With this patch, I am proposing that we get rid of the assumptions and instead give the targets and developers a convenient way to specify exactly which instructions should be treated as LoadBarriers and/or StoreBarriers.

First off, I'd really like to apologize for how long this patch has taken me to post. Andrea and I discussed this in late August and I had almost all of it completed by early September. Due to a mixture of personal and technical issues, I was unable to finalize it until now. As far as I can tell, the patch is still relevant, but let me know if not.

As mentioned in the TLDR, this patch aims to be more explicit about which instructions are memory-barriers. Currently, the LSUnit uses the MayLoad, MayStore, and HasUnmodeledSideEffects flags to decide which instructions should behave as barriers. This is a conservative assumption and leads to some instructions causing stalls in the pipeline that shouldn't exist.

What I've done is add two flags to the InstructionBase class (IsALoadBarrier and IsAStoreBarrier). These flags can be modified and evaluated with the isAStoreBarrier(), isALoadBarrier(), setStoreBarrier(), and setLoadBarrier() methods from the same class. In my mind, it is most natural to set these flags within a target's InstrPostProcess::postProcessInstruction() method. An example of this is included in the patch within the newly added X86InstrPostProcess class (within the X86CustomBehaviour.cpp file).

I also added a new command-line argument to llvm-mca (--show-barriers) which adds two additional columns to the InstructionInfoView to show which instructions are load and/or store barriers. This argument defaults to false so the default behaviour of this view is unchanged.

I expected this patch to be quite disruptive to the mca test cases, but I only had to update 10 of the test cases (and then I added a new test case that shows the --show-barriers argument in action). I would really appreciate if some people could look through how those test cases were altered to make sure that they haven't been made inaccurate (or at least not more inaccurate). Feel free to add more reviewers if you'd like target specific developers to take a look at the updated tests.

Each of those 10 test cases have 1 or more instructions that the LSUnit used to conservatively assume were load and/or store barriers. I've listed those instructions below. For some of these test cases, this patch affected their resource pressure statistics. This is not something that I am very familiar with so I'd also really appreciate if someone could make sure that these changes to the resource pressure stats aren't moving in the wrong direction.

AArch64/Cortex/A55-load-store-noalias.s:
nop

AMDGPU/gfx9-retireooo.s:
flat_load_dword (This is the test case that motivated this patch so I'm fairly confident this one shouldn't be treated as a barrier.)

X86/Barcelona/store-throughput.s :
movd

X86/BdVer2/load-store-throughput.s:
movd

X86/BdVer2/pr37790.s:
int3 and stmxcsr

X86/BdVer2/store-throughput.s:
movd

X86/BtVer2/pr37790.s:
int3 and stmxcsr

X86/BtVer2/stmxcsr-ldmxcsr.s:
stmxcsr and ldmxcsr

X86/Haswell/reserved-resources.s:
fxrstor

X86/Haswell/stmxcsr-ldmxcsr.s:
stmxcsr and ldmxcsr

If any of these instructions should be load and/or store barriers, let me know and I can update this patch to set the appropriate flag(s) for those instructions.

Thank you for your time and thanks in advance for any of your input, suggestions, criticisms, or questions. And sorry again for having this patch take so long.

Diff Detail

Event Timeline

holland11 created this revision.Jan 6 2022, 5:33 PM
holland11 requested review of this revision.Jan 6 2022, 5:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 6 2022, 5:33 PM
holland11 edited the summary of this revision. (Show Details)Jan 6 2022, 5:33 PM
TLDR: Currently, llvm-mca makes a very conservative assumption about which instructions are and aren't memory-barrier instructions. This leads to quite a few false positives which can result in inaccurate simulations. With this patch, I am proposing that we get rid of the assumptions and instead give the targets and developers a convenient way to specify exactly which instructions should be treated as LoadBarriers and/or StoreBarriers.

Thanks for working on this!

First off, I'd really like to apologize for how long this patch has taken me to post. Andrea and I discussed this in late August and I had almost all of it completed by early September. Due to a mixture of personal and technical issues, I was unable to finalize it until now. As far as I can tell, the patch is still relevant, but let me know if not.

Don't need to apologise! I wish there were more contributors like you :)
This patch is definitely still relevant. Thanks for contributing it!

As mentioned in the TLDR, this patch aims to be more explicit about which instructions are memory-barriers. Currently, the LSUnit uses the MayLoad, MayStore, and HasUnmodeledSideEffects flags to decide which instructions should behave as barriers. This is a conservative assumption and leads to some instructions causing stalls in the pipeline that shouldn't exist.

What I've done is add two flags to the InstructionBase class (IsALoadBarrier and IsAStoreBarrier). These flags can be modified and evaluated with the isAStoreBarrier(), isALoadBarrier(), setStoreBarrier(), and setLoadBarrier() methods from the same class. In my mind, it is most natural to set these flags within a target's InstrPostProcess::postProcessInstruction() method. An example of this is included in the patch within the newly added X86InstrPostProcess class (within the X86CustomBehaviour.cpp file).

Sounds good to me!

I also added a new command-line argument to llvm-mca (--show-barriers) which adds two additional columns to the InstructionInfoView to show which instructions are load and/or store barriers. This argument defaults to false so the default behaviour of this view is unchanged.

Thanks for adding that flag!

I had a look at the patch and it looks good modulo a couple of nits.

The changes to the x86 tests make sense. I only had a couple of nits (see my other comments below).
I trust you that the CMake changes work fine. :)

llvm/test/tools/llvm-mca/X86/barrier_output.s
1–16

You also need to test SFENCE here.
You can also get rid of most instructions here. Personally, I would only leave SFENCE MFENCE LFENCE and any other instructions with "unmodeled side effects". So, CLFLUSH, LFENCE, MFENCE and SFENCE are fine. All other SSE instructions can be removed in my opinion

llvm/tools/llvm-mca/Views/InstructionInfoView.h
58

Can this be something like this?

using UniqueInst = std::unique_ptr<mca::Instruction>;
ArrayRef<UniqueInst> LoweredInsts;
llvm/tools/llvm-mca/llvm-mca.cpp
555–557

I wonder if we should use SmallVector for the LoweredSequence (instead of a std::vector).
The average code snippet size tends to be very small. So, a SmallVector might perform a bit better.

The constructor of InstructionInfoView should probably use an ArrayRef for the LoweredSequence. This would be similar to what we already do for the SourceMgr.

Forgot to mention that this work is related to issue: https://github.com/llvm/llvm-project/issues/36015

We need to remember to resolve that issue once this patch goes in.

holland11 updated this revision to Diff 398316.Jan 7 2022, 11:41 PM

@andreadb Thanks for the feedback / suggestions. They all made sense and I implemented each of them. Let me know if they look how you expected them to.

andreadb accepted this revision.Jan 8 2022, 4:08 AM

@andreadb Thanks for the feedback / suggestions. They all made sense and I implemented each of them. Let me know if they look how you expected them to.

It looks good to me.

Thanks Patrick!

This revision is now accepted and ready to land.Jan 8 2022, 4:08 AM