This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Implement memory model
ClosedPublic

Authored by kzhuravl on Sep 15 2016, 1:15 PM.

Details

Diff Detail

Event Timeline

kzhuravl updated this revision to Diff 71542.Sep 15 2016, 1:15 PM
kzhuravl retitled this revision from to temp.
kzhuravl updated this object.
kzhuravl removed subscribers: arsenm, wdng, nhaehnle and 2 others.
kzhuravl updated this revision to Diff 72013.Sep 21 2016, 1:10 AM
tony-tye edited reviewers, added: t-tye; removed: tony-tye.Mar 22 2017, 6:10 PM
t-tye added inline comments.Apr 17 2017, 9:23 PM
lib/Target/AMDGPU/SIMemoryLegalizer.cpp
277

Is this needed only if the the instruction is a VMEM or FLAT instruction, not if a DS? It is only ensuring that the load has completed before doing the VMEM invalidate.

t-tye added inline comments.Apr 18 2017, 12:26 PM
lib/Target/AMDGPU/SIMemoryLegalizer.cpp
347

Should this be done? For rmw the glc bit controls whether the original value is returned, not whether the L1 cache is bypassed.

397

Ditto.

419

Ditto.

t-tye added inline comments.Apr 18 2017, 3:50 PM
lib/Target/AMDGPU/SIMemoryLegalizer.cpp
361

Is this required if a DS memory operation? Seems it is only required if a VMEM or FLAT instruction to ensure it has completed before invalidating the cache.

408

Is this required if a DS memory operation? Seems it is only required if a VMEM or FLAT instruction to ensure it has completed before invalidating the cache.

arsenm added inline comments.Apr 18 2017, 4:05 PM
test/CodeGen/AMDGPU/global_atomics.ll
1026

Why does this lose the glc bit?

arsenm added inline comments.Apr 18 2017, 4:07 PM
lib/Target/AMDGPU/SIInstructions.td
515

There's an SPseduoInst or something like that which will avoid needing to set any of these bits

519

I don't think this is necessary, only mayLoad and mayStore

lib/Target/AMDGPU/SIMemoryLegalizer.cpp
194–195

If there are no memory operands, this should still work and be handled as conservatively as possible

t-tye added inline comments.
lib/Target/AMDGPU/SIMemoryLegalizer.cpp
233

I believe a waitcnt vmem(0) is required before the InsertBufferWbinvl1Vol to ensure any previous atomic load has completed that the fence will pair with to create a synchronizes-with relation.

kzhuravl planned changes to this revision.Apr 21 2017, 9:22 AM
kzhuravl updated this revision to Diff 106338.Jul 12 2017, 4:14 PM
kzhuravl marked 3 inline comments as done.
kzhuravl retitled this revision from [AMDGPU] Implement memory model to AMDGPU: Implement memory model.
kzhuravl edited reviewers, added: arsenm, tstellar; removed: tstellarAMD.

Update implementation.

kzhuravl added inline comments.Jul 12 2017, 4:14 PM
lib/Target/AMDGPU/SIInstructions.td
515

Already committed separately with comments taken care of.

519

Already committed separately with comments taken care of.

lib/Target/AMDGPU/SIMemoryLegalizer.cpp
194–195

Do you see an issue with this approach?

277

As discussed, this will be done in separate change.

347

As discussed, this will be done in separate change.

361

As discussed, this will be done in separate change.

397

As discussed, this will be done in separate change.

408

As discussed, this will be done in separate change.

419

As discussed, this will be done in separate change.

test/CodeGen/AMDGPU/global_atomics.ll
1026

I think it does not need to be set in this case.

arsenm added inline comments.Jul 12 2017, 4:28 PM
lib/Target/AMDGPU/SIMemoryLegalizer.cpp
194–195

Yes. If there are no memory operands, this could skip an atomic load.

t-tye added inline comments.Jul 12 2017, 5:46 PM
lib/Target/AMDGPU/SIMemoryLegalizer.cpp
328

May be worth asserting that FailureOrdering is not AtomicOrdering::Release or AtomicOrdering::AcquireRelease as these are not allowed, and following code relies on that fact.

test/CodeGen/AMDGPU/flat_atomics.ll
934–935

Curious why glc is no longer being checked for?

kzhuravl updated this revision to Diff 107151.Jul 18 2017, 12:13 PM
kzhuravl marked 3 inline comments as done.

Address review feedback.

lib/Target/AMDGPU/SIMemoryLegalizer.cpp
328

Verifier takes care of those.

test/CodeGen/AMDGPU/flat_atomics.ll
934–935

Not required here. Discussed offline.

t-tye accepted this revision.Jul 18 2017, 5:09 PM

One question, otherwise LGTM.

lib/Target/AMDGPU/SIInstructions.td
323

Since a fence itself does not modify memory, and has no memory address operand, should it be marked maybeatomic?

This revision is now accepted and ready to land.Jul 18 2017, 5:09 PM
kzhuravl updated this revision to Diff 107323.Jul 19 2017, 9:37 AM

Fix atomic fence expansion for workgroup/wavefront/singlethread scopes.

t-tye accepted this revision.Jul 19 2017, 11:19 AM

LGTM