In order to combine memory operations efficiently, the load/store optimizer might move some instructions around. It's usually safe to move instructions down past the merged instruction because the pass checks if memory operations can be re-ordered. Though, the current logic doesn't handle Write-after-Write hazards. This fixes a reflection issue with Monster Hunter World and DXVK. v2: - rebased on top of master - clean up the test case - handle WaW hazards correctly Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=40130
Details
- Reviewers
tpr arsenm nhaehnle - Commits
- rZORGfba03edb5883: [AMDGPU] detect WaW hazards when moving/merging load/store instructions
rGfba03edb5883: [AMDGPU] detect WaW hazards when moving/merging load/store instructions
rGc4bc61bad7b2: [AMDGPU] detect WaW hazards when moving/merging load/store instructions
rL361008: [AMDGPU] detect WaW hazards when moving/merging load/store instructions
Diff Detail
Event Timeline
That's an interesting catch. Basically a Write-after-Write hazard.
Though this seems like the wrong place to add this check, since the function is specifically called canMoveInstsAcrossMemOp.
I rather suspect that instead addToListsIfDependent needs to be fixed instead, so that when the scan hits the S_CMP_EQ_U32 it recognizes the WAW hazard and adds it to the instructions to be moved.
I also wonder if this is related to D60459.
test/CodeGen/AMDGPU/merge-load-store.mir | ||
---|---|---|
62 | You should remove most of the LLVM IR from the test file. That should be possible as long as you remove the references to IR from the MIR below, e.g. the :: (dereferenceable invariant load 4 from %ir.arg.kernarg.offset.cast, addrspace 4) and change bb.0.bb to bb.0 etc. | |
182–183 | I don't think this is needed. | |
210 | It should be possible to further reduce this test case, e.g. most of bb.2 seems to be unnecessary for the test. |
The S_AND_B64 gets picked up as an InstToMove because it uses a value from a memory operation.
When scanning further and considering the S_CMP_EQ_U32, the addToListsIfDependent checks if any of the *uses* of S_CMP_EQ_U32 refer to any of the defs of the S_AND_B64. At this point, you also have to check the *defs* of the S_CMP_EQ_U32 against the defs of the S_AND_B64. At that point, the code should be able to recognize that both instructions write to the physical register SCC, and add the S_CMP_EQ_U32 to the instructions to be moved.
I tried your latest patch on my test case test/CodeGen/AMDGPU/scc-add-lshl-addc.ll in D60459, and it takes care of my issue.
I certainly prefer your patch to mine :-)
Thanks! This looks good, except I still believe the test case can be simplified.
test/CodeGen/AMDGPU/merge-load-store.mir | ||
---|---|---|
62 | What about this comment? |
test/CodeGen/AMDGPU/merge-load-store.mir | ||
---|---|---|
62 | I think I tried and failed to reduce the test case for some reasons. I will try again. |
test/CodeGen/AMDGPU/merge-load-store.mir | ||
---|---|---|
62 | What do you want to remove exactly? The test case seems small. |
test/CodeGen/AMDGPU/merge-load-store.mir | ||
---|---|---|
62 | I think it should be possible to remove all of the LLVM IR. Maybe everything except for a ret void. You'll have to remove the from %ir.tmp1 part in the MIR. |
Thank you too!
Can you eventually push the patch for me? I think I no longer have commit access since the LLVM relicense stuff.
You should remove most of the LLVM IR from the test file. That should be possible as long as you remove the references to IR from the MIR below, e.g. the :: (dereferenceable invariant load 4 from %ir.arg.kernarg.offset.cast, addrspace 4) and change bb.0.bb to bb.0 etc.