Change explores the fact that LDS reads may be reordered even if access the same location.
Prior the change, algorithm immediately stops as soon as any memory access encountered between loads that are expected to be merged together. Although, Read-After-Read conflict cannot affect execution correctness.
Improves hcBLAS CGEMM manually loop-unrolled kernels performance by 44%. Also improvement expected on any massive sequences of reads from LDS.
We don't actually have AA enabled in the backend. We also need to add an address space alias analysis pass. If these were done, would that avoid the need to have this looser check?