Page MenuHomePhabricator

[MCA] Fixed a bug where loads and stores were sometimes incorrectly marked as depedent (PR45793).
ClosedPublic

Authored by andreadb on May 4 2020, 12:10 PM.

Details

Summary

This fixes a regression introduced by a very old commit 280ac1fd1dc35 (was llvm-svn 361950).

Commit 280ac1fd1dc35 redesigned the logic in the LSUnit with the goal of speeding up isReady() queries, and stabilising the LSUnit API (while also making the load store unit more customisable).

The concept of MemoryGroup (effectively an alias set) was added by that commit to better describe dependencies between memory operations. However, that concept was not just used to describe simple alias dependencies, but it was also used for describing memory "order" dependencies (enforced by the memory consistency model).
Instructions of a same memory group were considered "equivalent" as in: independent operations that can potentially execute in parallel.
The problem was that the cost of a dependency (in terms of number of cycles) is different if the instruction is in a "order" dependency, and simply has to wait for the predecessor to be "issued" on a pipeline (rather than being fully executed). For simple "order" dependencies, this was effectively introducing an artificial delay on the "issue" of independent loads and stores.

This patch fixes the issue and adds a new test named 'independent-load-stores.s' to a bunch of x86 targets. That test contains the reproducible posted by Fabian Ritter on PR45793.

I had to rerun the update-mca-tests script on several files. To avoid expected regressions on some Exynos tests, I have added a -noalias=false flag (to match the old strict behavior on latencies).

Some tests for processor Barcelona are fixed by this change and they now show better results.
In a few tests we were incorrectly counting the time spent by instructions in a scheduler queue (this was also caused by the issue on the delayed start of execution for loads and stores).
In one case in particular we now correctly see a store executed out of order. That test was affected by the same underlying issue reported as PR45793.

Another test related to store barriers has improved as a result of this change. Instuction int3 is treated by llvm-mca as a full memory barrier (since it mayload/maystore and has unmodelled side effects). So, memory instructions coming after it had to wait until int3 was effectively executed. This was not happening before. This is issue now as a consequence of the rewrite from this patch.

Let me know if OK to commit.
Thanks

Diff Detail

Event Timeline

andreadb created this revision.May 4 2020, 12:10 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2020, 12:10 PM
andreadb updated this revision to Diff 261907.May 4 2020, 12:34 PM

Patch updated. This time with full context.

mattd added a comment.May 4 2020, 12:41 PM
This comment was removed by mattd.
llvm/include/llvm/MCA/HardwareUnits/LSUnit.h
61

s/unsigned/size_t/ because SmallVectorBase::size returns a size_t.

mattd added a comment.May 4 2020, 1:05 PM

I'm totally cool with this change; however, it's been a while since I've taken a look at this part of MCA. I'll let other's chime in as well, but +1 from me.

llvm/lib/MCA/HardwareUnits/LSUnit.cpp
69

This function is getting pretty large. I'm not sure how to break this up, but it could be useful as a NFC follow-up at a later time.

131

Is !ImmediateLoadDominator necessary here? It seems that would be implied by the next condition: ImmediateLoadDominator <= CurrentStoreGroupID However, I suppose this does read clearer by leaving the !ImmediateLoadDominator check in place..

andreadb marked 6 inline comments as done.May 4 2020, 1:21 PM

I'm totally cool with this change; however, it's been a while since I've taken a look at this part of MCA. I'll let other's chime in as well, but +1 from me.

Thanks Matt :-)

llvm/include/llvm/MCA/HardwareUnits/LSUnit.h
61

Right. Ideally it should return a size_t. I guess it hardly would be a problem in practice. But you are right. I will fix it.

119

To clarify.
The reason why there is this early exit here is because we only need to update the information about a critical "memory" dependency if there is memory aliasing.

llvm/lib/MCA/HardwareUnits/LSUnit.cpp
69

Definitely a good suggestion for a follow-up. :-)
This logic can probably be split and moved into a few helper methods/functions.

95

Note that, by construction CurrentStoreBarrierGroupID is always less than or equal to CurrentStoreGroupID.
This check was missing in the original code from commit 280ac1fd1dc35. And this is exactly the reason why we have a difference in test pr37790.s

105

This check is to avoid that we add a store dependency twice (since a store can also be a store barrier).

131

The idea is to identify cases where we need a new node in the dependency graph.

These are the cases identified by this check:
a) This is a memory barrier (by construction we always require that barriers are assigned to different memory group);
b) This is the very first load dispatched to the LSUnit (by construction we always keep loads and stores into separate groups. So we need to start a new load group).
c) There is an intervening store between the last load dispatched to the LSU and this load;
d) There is no intervening store. However the last load group has already started execution (so we need a new node).

mattd added inline comments.May 4 2020, 1:46 PM
llvm/lib/MCA/HardwareUnits/LSUnit.cpp
96

nit: You call this StoreGroup here, but StGroup a few blocks below. I really do not care that there is a difference of a few characters, but I could not silence my inner dialog.

131

Those points would make for a great comment! Thanks for the clarification.

andreadb updated this revision to Diff 261945.May 4 2020, 3:38 PM
andreadb marked 5 inline comments as done.

Addressed review comments.

mattd accepted this revision.May 4 2020, 3:42 PM
This revision is now accepted and ready to land.May 4 2020, 3:42 PM
This revision was automatically updated to reflect the committed changes.