This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add wave barrier builtin
ClosedPublic

Authored by rampitec on Nov 13 2016, 1:34 AM.

Details

Summary

The wave barrier represents the discardable barrier. Its main purpose is to carry convergent attribute, thus preventing illegal CFG optimizations.
All lanes in a wave come to convergence point simultaneously with SIMT, thus no special instruction is needed in the ISA.
The barrier is discarded during code generation.

Diff Detail

Repository
rL LLVM

Event Timeline

rampitec updated this revision to Diff 77739.Nov 13 2016, 1:34 AM
rampitec retitled this revision from to [AMDGPU] Add wave barrier builtin.
rampitec updated this object.
rampitec added a reviewer: vpykhtin.
rampitec set the repository for this revision to rL LLVM.
rampitec added a subscriber: llvm-commits.

This also needs a corresponding convergent pseudoinstruction otherwise machine CFG changes could do the same things

include/llvm/IR/IntrinsicsAMDGPU.td
110 ↗(On Diff #77739)

There is no scalar instruction, so this should drop the _s_

rampitec updated this revision to Diff 77754.Nov 13 2016, 11:28 AM
rampitec added a reviewer: arsenm.

Renamed builtin to remove s_ as there is no corresponding instruction.

This also needs a corresponding convergent pseudoinstruction otherwise machine CFG changes could do the same things

I though this must the next step. One thing at a time.

rampitec marked an inline comment as done.Nov 13 2016, 12:52 PM
rampitec updated this revision to Diff 77851.Nov 14 2016, 11:36 AM
rampitec updated this object.
rampitec edited edge metadata.

Added pseudo instruction WAVE_BARRIER to prevent MachineInst optimizations as well.

lib/Target/AMDGPU/SIInstrInfo.cpp
903–906 ↗(On Diff #77851)

Does it matter that Post-RA scheduler runs after the Post-RA Pseudo Expansion pass?

rampitec marked an inline comment as done.Nov 14 2016, 2:09 PM
rampitec added inline comments.
lib/Target/AMDGPU/SIInstrInfo.cpp
903–906 ↗(On Diff #77851)

I do not think that is important. Scheduler will not move an instruction to a non-equivalent control flow block.

rampitec marked an inline comment as done.Nov 14 2016, 2:10 PM
arsenm added inline comments.Nov 14 2016, 2:39 PM
lib/Target/AMDGPU/SIInstrInfo.cpp
903–906 ↗(On Diff #77851)

I think this should be emitted in the asm printer as a comment rather than deleting it during codegen passes

rampitec updated this revision to Diff 77894.Nov 14 2016, 3:24 PM
rampitec edited edge metadata.

Wave barrier replaced with comment at emission.

arsenm added inline comments.Nov 14 2016, 6:05 PM
lib/Target/AMDGPU/SIInstructions.td
142 ↗(On Diff #77894)

I think this shouldn't be set. I think this will end up counting this as 4 bytes instead of 0

143 ↗(On Diff #77894)

This should probably be empty and set to hasNoSchedulingInfo like si_mask_branch

rampitec updated this revision to Diff 77940.Nov 14 2016, 8:10 PM
rampitec edited edge metadata.

Fixed size calculation for new instructions.
Refined attributes.

rampitec marked 2 inline comments as done.Nov 14 2016, 8:11 PM
rampitec added inline comments.
lib/Target/AMDGPU/SIInstructions.td
142 ↗(On Diff #77894)

If it does not have SALU, then it is VALU, and that is what happens:

  • Bad machine code: VALU instruction does not implicitly read exec mask ***

I have added it to SIInstrInfo::getInstSizeInBytes() instead.

rampitec marked an inline comment as done.Nov 14 2016, 8:14 PM
arsenm accepted this revision.Nov 15 2016, 10:13 AM
arsenm edited edge metadata.

LGTM

This revision is now accepted and ready to land.Nov 15 2016, 10:13 AM
This revision was automatically updated to reflect the committed changes.