This fixes a regression introduced by a very old commit 280ac1fd1dc35 (was llvm-svn 361950).
Commit 280ac1fd1dc35 redesigned the logic in the LSUnit with the goal of speeding up isReady() queries, and stabilising the LSUnit API (while also making the load store unit more customisable).
The concept of MemoryGroup (effectively an alias set) was added by that commit to better describe dependencies between memory operations. However, that concept was not just used to describe simple alias dependencies, but it was also used for describing memory "order" dependencies (enforced by the memory consistency model).
Instructions of a same memory group were considered "equivalent" as in: independent operations that can potentially execute in parallel.
The problem was that the cost of a dependency (in terms of number of cycles) is different if the instruction is in a "order" dependency, and simply has to wait for the predecessor to be "issued" on a pipeline (rather than being fully executed). For simple "order" dependencies, this was effectively introducing an artificial delay on the "issue" of independent loads and stores.
This patch fixes the issue and adds a new test named 'independent-load-stores.s' to a bunch of x86 targets. That test contains the reproducible posted by Fabian Ritter on PR45793.
I had to rerun the update-mca-tests script on several files. To avoid expected regressions on some Exynos tests, I have added a -noalias=false flag (to match the old strict behavior on latencies).
Some tests for processor Barcelona are fixed by this change and they now show better results.
In a few tests we were incorrectly counting the time spent by instructions in a scheduler queue (this was also caused by the issue on the delayed start of execution for loads and stores).
In one case in particular we now correctly see a store executed out of order. That test was affected by the same underlying issue reported as PR45793.
Another test related to store barriers has improved as a result of this change. Instuction int3 is treated by llvm-mca as a full memory barrier (since it mayload/maystore and has unmodelled side effects). So, memory instructions coming after it had to wait until int3 was effectively executed. This was not happening before. This is issue now as a consequence of the rewrite from this patch.
Let me know if OK to commit.