The SPIR-V memory barriers are translated in a very pessimistic way by
LLPC. When translating, LLPC does not know where memory will be stored, so
it will introduce a fence instruction. The fence will eventually be
turned into an instruction to invalidate the memory cache.
If the barrier was for workgroup shared memory and all workgroup shared
variabled are allocated to LDS, then the L1 cache does not have to be
invalidated. However, the code will still invalidate it.
This commit modifies the SI-Insert-waitcnts pass to remove cache
invalidation instructions it can prove will not be needed. If no store
or load to memory that is cached reaches the cache invalidation instruction
without passing through another cache invalidation instruction, then it is
safe to remove.
gfx90a also has L2 cache control. See https://llvm.org/docs/AMDGPUUsage.html#memory-model-gfx90a .