This is an archive of the discontinued LLVM Phabricator instance.

[Scheduling] Improve group algorithm for store cluster
ClosedPublic

Authored by steven.zhang on Jul 20 2020, 12:23 AM.

Details

Summary

The scheduler will try to classify the MemOps into different groups and then clustering neighboring MemOps within each group. The current algorithm is to have the MemOps with the same ctrl(non-data) dep into the same group. That works fine for load but not well for store as store might have two memory dep.

See this example: Store Addr and Store Addr+8 are clusterable pair. They have memory(ctrl) dependency on different loads. Current implementation will put these two stores into different group and miss to cluster them.

Load X               Load Y
  ^                    ^
  |                    |
  |mem                 |mem
  |                    |
  +                    +
Store Addr           Store Addr+8
  ^                    ^
  +--------------------+
         cluster

This will affect the case like this.

void foo(long long *restrict a, long long *restrict b, long long *restrict c, int n) {
        for (int i =0; i<n;i++)
            a[i] += b[i]*c[i];
}

Diff Detail

Event Timeline

steven.zhang created this revision.Jul 20 2020, 12:23 AM
steven.zhang edited the summary of this revision. (Show Details)Jul 20 2020, 2:34 AM

Other than that, it seems sensible.

llvm/lib/CodeGen/MachineScheduler.cpp
1663

It doesn't seem to me that the condition needs to be cached in a variable.

Address comments.

evandro accepted this revision.Jul 24 2020, 11:48 AM
This revision is now accepted and ready to land.Jul 24 2020, 11:48 AM
This revision was automatically updated to reflect the committed changes.
foad added a subscriber: foad.Aug 7 2020, 7:10 AM