This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Implement memory model
ClosedPublic

Authored by kzhuravl on Sep 15 2016, 1:15 PM.

Details

Diff Detail

Event Timeline

kzhuravl updated this revision to Diff 71542.Sep 15 2016, 1:15 PM
kzhuravl retitled this revision from to temp.
kzhuravl updated this object.
kzhuravl removed subscribers: arsenm, wdng, nhaehnle and 2 others.
kzhuravl updated this revision to Diff 72013.Sep 21 2016, 1:10 AM
tony-tye edited reviewers, added: t-tye; removed: tony-tye.Mar 22 2017, 6:10 PM
t-tye added inline comments.Apr 17 2017, 9:23 PM
lib/Target/AMDGPU/SIMemoryLegalizer.cpp
278

Is this needed only if the the instruction is a VMEM or FLAT instruction, not if a DS? It is only ensuring that the load has completed before doing the VMEM invalidate.

t-tye added inline comments.Apr 18 2017, 12:26 PM
lib/Target/AMDGPU/SIMemoryLegalizer.cpp
348

Should this be done? For rmw the glc bit controls whether the original value is returned, not whether the L1 cache is bypassed.

398

Ditto.

420

Ditto.

t-tye added inline comments.Apr 18 2017, 3:50 PM
lib/Target/AMDGPU/SIMemoryLegalizer.cpp
362

Is this required if a DS memory operation? Seems it is only required if a VMEM or FLAT instruction to ensure it has completed before invalidating the cache.

409

Is this required if a DS memory operation? Seems it is only required if a VMEM or FLAT instruction to ensure it has completed before invalidating the cache.

arsenm added inline comments.Apr 18 2017, 4:05 PM
test/CodeGen/AMDGPU/global_atomics.ll
1026–1027

Why does this lose the glc bit?

arsenm added inline comments.Apr 18 2017, 4:07 PM
lib/Target/AMDGPU/SIInstructions.td
515 ↗(On Diff #72013)

There's an SPseduoInst or something like that which will avoid needing to set any of these bits

519 ↗(On Diff #72013)

I don't think this is necessary, only mayLoad and mayStore

lib/Target/AMDGPU/SIMemoryLegalizer.cpp
195–196

If there are no memory operands, this should still work and be handled as conservatively as possible

t-tye added inline comments.
lib/Target/AMDGPU/SIMemoryLegalizer.cpp
234

I believe a waitcnt vmem(0) is required before the InsertBufferWbinvl1Vol to ensure any previous atomic load has completed that the fence will pair with to create a synchronizes-with relation.

kzhuravl planned changes to this revision.Apr 21 2017, 9:22 AM
kzhuravl updated this revision to Diff 106338.Jul 12 2017, 4:14 PM
kzhuravl marked 3 inline comments as done.
kzhuravl retitled this revision from [AMDGPU] Implement memory model to AMDGPU: Implement memory model.
kzhuravl edited reviewers, added: arsenm, tstellar; removed: tstellarAMD.

Update implementation.

kzhuravl added inline comments.Jul 12 2017, 4:14 PM
lib/Target/AMDGPU/SIInstructions.td
515 ↗(On Diff #72013)

Already committed separately with comments taken care of.

519 ↗(On Diff #72013)

Already committed separately with comments taken care of.

lib/Target/AMDGPU/SIMemoryLegalizer.cpp
195–196

Do you see an issue with this approach?

278

As discussed, this will be done in separate change.

348

As discussed, this will be done in separate change.

362

As discussed, this will be done in separate change.

398

As discussed, this will be done in separate change.

409

As discussed, this will be done in separate change.

420

As discussed, this will be done in separate change.

test/CodeGen/AMDGPU/global_atomics.ll
1026–1027

I think it does not need to be set in this case.

arsenm added inline comments.Jul 12 2017, 4:28 PM
lib/Target/AMDGPU/SIMemoryLegalizer.cpp
195–196

Yes. If there are no memory operands, this could skip an atomic load.

t-tye added inline comments.Jul 12 2017, 5:46 PM
lib/Target/AMDGPU/SIMemoryLegalizer.cpp
327

May be worth asserting that FailureOrdering is not AtomicOrdering::Release or AtomicOrdering::AcquireRelease as these are not allowed, and following code relies on that fact.

test/CodeGen/AMDGPU/flat_atomics.ll
1001

Curious why glc is no longer being checked for?

kzhuravl updated this revision to Diff 107151.Jul 18 2017, 12:13 PM
kzhuravl marked 3 inline comments as done.

Address review feedback.

lib/Target/AMDGPU/SIMemoryLegalizer.cpp
327

Verifier takes care of those.

test/CodeGen/AMDGPU/flat_atomics.ll
1001

Not required here. Discussed offline.

t-tye accepted this revision.Jul 18 2017, 5:09 PM

One question, otherwise LGTM.

lib/Target/AMDGPU/SIInstructions.td
102 ↗(On Diff #107151)

Since a fence itself does not modify memory, and has no memory address operand, should it be marked maybeatomic?

This revision is now accepted and ready to land.Jul 18 2017, 5:09 PM
kzhuravl updated this revision to Diff 107323.Jul 19 2017, 9:37 AM

Fix atomic fence expansion for workgroup/wavefront/singlethread scopes.

t-tye accepted this revision.Jul 19 2017, 11:19 AM

LGTM