This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] detect WaW hazards when moving/merging load/store instructions
ClosedPublic

Authored by hakzsam on Apr 30 2019, 7:30 AM.

Details

Summary
In order to combine memory operations efficiently, the load/store
optimizer might move some instructions around. It's usually safe
to move instructions down past the merged instruction because the
pass checks if memory operations can be re-ordered.

Though, the current logic doesn't handle Write-after-Write hazards.

This fixes a reflection issue with Monster Hunter World and DXVK.

v2: - rebased on top of master
    - clean up the test case
    - handle WaW hazards correctly

Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=40130

Diff Detail

Event Timeline

hakzsam created this revision.Apr 30 2019, 7:30 AM

That's an interesting catch. Basically a Write-after-Write hazard.

Though this seems like the wrong place to add this check, since the function is specifically called canMoveInstsAcrossMemOp.

I rather suspect that instead addToListsIfDependent needs to be fixed instead, so that when the scan hits the S_CMP_EQ_U32 it recognizes the WAW hazard and adds it to the instructions to be moved.

I also wonder if this is related to D60459.

test/CodeGen/AMDGPU/merge-load-store.mir
62

You should remove most of the LLVM IR from the test file. That should be possible as long as you remove the references to IR from the MIR below, e.g. the :: (dereferenceable invariant load 4 from %ir.arg.kernarg.offset.cast, addrspace 4) and change bb.0.bb to bb.0 etc.

145–146

I don't think this is needed.

173

It should be possible to further reduce this test case, e.g. most of bb.2 seems to be unnecessary for the test.

ronlieb added a subscriber: ronlieb.May 3 2019, 5:09 AM

This sure does look like the same problem to me. https://reviews.llvm.org/D60459

have you run all the lit tests on this patch ?

This sure does look like the same problem to me. https://reviews.llvm.org/D60459

D60459 doesn't fix the issue.

have you run all the lit tests on this patch ?

How do I launch that?
I tested with 'make check' btw, no regressions found!

have you run all the lit tests on this patch ?

How do I launch that?
I tested with 'make check' btw, no regressions found!

Those are the same thing

That's an interesting catch. Basically a Write-after-Write hazard.

Though this seems like the wrong place to add this check, since the function is specifically called canMoveInstsAcrossMemOp.

I rather suspect that instead addToListsIfDependent needs to be fixed instead, so that when the scan hits the S_CMP_EQ_U32 it recognizes the WAW hazard and adds it to the instructions to be moved.

I also wonder if this is related to D60459.

How do I recognize the WAW when scanning S_CMP_EQ_U32? I'm not sure to follow you.

That's an interesting catch. Basically a Write-after-Write hazard.

Though this seems like the wrong place to add this check, since the function is specifically called canMoveInstsAcrossMemOp.

I rather suspect that instead addToListsIfDependent needs to be fixed instead, so that when the scan hits the S_CMP_EQ_U32 it recognizes the WAW hazard and adds it to the instructions to be moved.

I also wonder if this is related to D60459.

How do I recognize the WAW when scanning S_CMP_EQ_U32? I'm not sure to follow you.

The S_AND_B64 gets picked up as an InstToMove because it uses a value from a memory operation.

When scanning further and considering the S_CMP_EQ_U32, the addToListsIfDependent checks if any of the *uses* of S_CMP_EQ_U32 refer to any of the defs of the S_AND_B64. At this point, you also have to check the *defs* of the S_CMP_EQ_U32 against the defs of the S_AND_B64. At that point, the code should be able to recognize that both instructions write to the physical register SCC, and add the S_CMP_EQ_U32 to the instructions to be moved.

hakzsam updated this revision to Diff 198455.May 7 2019, 6:38 AM
hakzsam retitled this revision from [AMDGPU] Fix moving memory operations if one instr defines SCC as dead to [AMDGPU] detect WaW hazards when moving/merging load/store instructions.
hakzsam edited the summary of this revision. (Show Details)

This sure does look like the same problem to me. https://reviews.llvm.org/D60459

D60459 doesn't fix the issue.

I tried your latest patch on my test case test/CodeGen/AMDGPU/scc-add-lshl-addc.ll in D60459, and it takes care of my issue.
I certainly prefer your patch to mine :-)

Thanks! This looks good, except I still believe the test case can be simplified.

test/CodeGen/AMDGPU/merge-load-store.mir
62

What about this comment?

hakzsam marked an inline comment as done.May 15 2019, 4:20 AM
hakzsam added inline comments.
test/CodeGen/AMDGPU/merge-load-store.mir
62

I think I tried and failed to reduce the test case for some reasons. I will try again.

hakzsam marked an inline comment as done.May 15 2019, 4:32 AM
hakzsam added inline comments.
test/CodeGen/AMDGPU/merge-load-store.mir
62

What do you want to remove exactly? The test case seems small.

nhaehnle added inline comments.May 15 2019, 9:17 AM
test/CodeGen/AMDGPU/merge-load-store.mir
62

I think it should be possible to remove all of the LLVM IR. Maybe everything except for a ret void. You'll have to remove the from %ir.tmp1 part in the MIR.

hakzsam updated this revision to Diff 199761.May 16 2019, 12:01 AM

v3: - remove most of the LLVM IR

nhaehnle accepted this revision.May 16 2019, 12:14 AM

Thanks!

This revision is now accepted and ready to land.May 16 2019, 12:14 AM

Thank you too!
Can you eventually push the patch for me? I think I no longer have commit access since the LLVM relicense stuff.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2019, 2:32 AM