This is an archive of the discontinued LLVM Phabricator instance.

[MachineMemOperand] Move synchronization scope and atomic orderings from SDNode to MachineMemOperand
ClosedPublic

Authored by kzhuravl on Sep 14 2016, 12:56 PM.

Details

Summary

+ remove redundant getAtomic* member functions from SelectionDAG.

In order to implement memory model in AMDGPU backend, we need to generate a specific sequence of instructions depending on atomic ordering and synchronization scope. These instructions have to stick together, for example:

...
%val = load atomic i32, i32 addrspace(4)* %in acquire, align 4
...

Results in:

...
s_waitcnt
flat_load_dword ...
s_waitcnt
buffer_wbinvl1_vol
...

One approach in implementing this is to use pseudo instructions, and expand them into real instructions in the post-RA expansion pass. The problem we have run into with this approach in AMDGPU backend is we have to define quite a lot of pseudo instructions due to having multiple different instruction opcodes for loads and stores.

Another approach we have explored is to move AtomicOrdering and SynchScope from MemSDNode and AtomicSDNode into MachineMemOperand. This way we do not have to define multiple pseudo instructions, and just use MachineMemOperand. There is also a fixme comment that suggests the same approach.

Diff Detail

Event Timeline

kzhuravl updated this revision to Diff 71411.Sep 14 2016, 12:56 PM
kzhuravl retitled this revision from to [RFC] Move synchronization scope and atomic orderings from SDNode to MachineMemOperand.
kzhuravl updated this object.
kzhuravl added reviewers: arsenm, tstellarAMD.
kzhuravl added subscribers: mehdi_amini, reames, jlebar and 4 others.
jlebar added inline comments.Sep 15 2016, 12:02 PM
include/llvm/CodeGen/MachineMemOperand.h
143

I am concerned that this will *substantially* increase the size of MMO.

Do you actually need all this space?

Gerolf added a subscriber: Gerolf.Sep 15 2016, 12:45 PM

My concern is this would increase memory consumption in all backends. Maybe you can hide most of it with an opaque pointer in the machine operand.

kzhuravl updated this revision to Diff 72009.Sep 21 2016, 12:02 AM

Hi, thanks for taking a look. I agree that the previous diff caused increased machine memory operand size. Currently machine memory operand's member layout has a 4 byte hole between BaseAlignLog2 and AAInfo. If we pack scope and orderings as it is done in the updated diff and place it between BaseAlignLog2 and AAInfo the size of the machine memory operand does not change. In addition, I think, having scope and ordering information available in machine memory operand will be useful for the scheduler.

Would that be acceptable?

Thanks,
Konstantin

kzhuravl added inline comments.Sep 21 2016, 12:04 AM
include/llvm/CodeGen/MachineMemOperand.h
123

If this patch (or some variation of it) is acceptable, the size of SynchScope will increase to 8 bits. Furthermore, we can generalize MachineAtomicInfo to AdditionalData with extra 16 bits available.

Currently machine memory operand's member layout has a 4 byte hole between BaseAlignLog2 and AAInfo.

Only on 64-bit platforms, right? Are we willing to bite a 4-byte increase to MMO size on 32-bit?

include/llvm/CodeGen/MachineMemOperand.h
131

I don't think we need this field.

151

Can we do this inside the constructor?

include/llvm/CodeGen/SelectionDAGNodes.h
1198

Should we make the MMO and AtomicSDNode interfaces match? Right now we have getOrdering and getSuccessOrdering on MMO, but only getOrdering on the SDNode.

I think I'd be OK having two functions that both return the success ordering if we assert inside getFailureOrdering / getSuccessOrdering that we have a cmpxchg instruction. Otherwise maybe having only the one function makes sense.

Currently machine memory operand's member layout has a 4 byte hole between BaseAlignLog2 and AAInfo.

Only on 64-bit platforms, right? Are we willing to bite a 4-byte increase to MMO size on 32-bit?

This would be 4-bytes per MachineInstr with an MMO, that doesn't see too bad to me.

Currently machine memory operand's member layout has a 4 byte hole between BaseAlignLog2 and AAInfo.

Only on 64-bit platforms, right? Are we willing to bite a 4-byte increase to MMO size on 32-bit?

This would be 4-bytes per MachineInstr with an MMO, that doesn't see too bad to me.

Alternatively, we can introduce a new class, which will describe atomic information and will be derived from MachineMemOperand (MachineAtomicOperand?). Make atomic accessor functions in MachineMemOperand virtual, and override them in the derived class. This way only 4 byte increase for atomic operations. Any thoughts on this?

kzhuravl updated this revision to Diff 74712.Oct 14 2016, 10:37 AM
kzhuravl edited edge metadata.
kzhuravl marked 3 inline comments as done.

Address inline comments.

include/llvm/CodeGen/MachineMemOperand.h
131

Removed.

151

Moved inside the constructor.

include/llvm/CodeGen/SelectionDAGNodes.h
1198

Added asserts inside getFailureOrdering/getSuccessOrdering.

jlebar accepted this revision.Oct 14 2016, 10:48 AM
jlebar added a reviewer: jlebar.

Alternatively, we can introduce a new class, which will describe atomic information and will be derived from MachineMemOperand (MachineAtomicOperand?). Make atomic accessor functions in MachineMemOperand virtual, and override them in the derived class. This way only 4 byte increase for atomic operations. Any thoughts on this?

This would necessarily add a vptr to every MMO object, increasing its size by one word. So that would be just as bad from a memory perspective (and substantially worse from a CPU perspective).

I think I'm OK with this as-is. I agree it makes sense for these fields to live on MMO.

This revision is now accepted and ready to land.Oct 14 2016, 10:48 AM
kzhuravl retitled this revision from [RFC] Move synchronization scope and atomic orderings from SDNode to MachineMemOperand to [MachineMemOperand] Move synchronization scope and atomic orderings from SDNode to MachineMemOperand.Oct 15 2016, 3:07 PM
kzhuravl edited edge metadata.
This revision was automatically updated to reflect the committed changes.