This is used by targets (like r600) that do RMW for truncating stores.
Future optimization should remove most redundant stores+loads
Fixes R600 regressions since r274397.
Details
- Reviewers
• tstellarAMD bogner
Diff Detail
- Repository
- rL LLVM
Event Timeline
I don't understand why you need this (even if you end up lowing the individual stores using a RMW sequence). Can you please explain?
the problem is if we use RMW sequence for two different elements of the same word. for example storing bytes at address A and A +1. let's assume that A is 4byte aligned. the generated code will look like this
1: r1 = LOAD A
2: r2 = {r1[8:31], x} This is a sequence of AND/OR instructions that masks of the old bits and ORs the new ones
3: STORE A, r2
4: r3 = LOAD A
5: r4 ={r3[0:8],y,r3[15:31]} This is a sequence of AND/OR instructions that masks of the old bits and ORs the new ones
6: STORE A, r4
The original code does not have dependency between 3 and 4. so 1 and 4 are loads from the same location and get merged into single load:
1: r1 = LOAD A
2: r2 = {r1[8:31], x} This is a sequence of AND/OR instructions that masks of the old bits and ORs the new ones
3: STORE A, r2
5: r4 ={r1[0:8],y,r1[15:31]} This is a sequence of AND/OR instructions that masks of the old bits and ORs the new ones
6: STORE A, r4
(note that sequences 2,3 and 5,6 are independent so the writes can occur in any order).
which results in data corruption at A.
This patch adds chain dependency between 3 and 4 preventing the load elimination. It should still be possible to eliminate both 3 and 4 (which in turn should enable combining of the bit ops in 2 and 5).
Why not? I understand why the truncated stores and extending loads might be independent, but if you have code that widens those without looking at potential aliased peers and adjusting the chain, I think the problem lies there.