This is an archive of the discontinued LLVM Phabricator instance.

Machine Level CFG Structurizer
ClosedPublic

Authored by jsjodin on Aug 5 2016, 9:54 AM.

Details

Summary

This patch implements CFG structurization based on the paper "Taming Control Divergence in GPUs through Control Flow Linearization" by Jayvant Anantpur, Govindarajan R.
It works somewhat different than the algorithm in the paper, the main difference is that it is region based, and does the transformation backward in stead of forward within a region.

The purpose for the new pass is to do the structurization much later in the compiler to not inhibit optimization. In the AMDGPU backend it is added right before register allocation. The transform is implemented as a target independent pass with a few new callbacks to the target. Hopefully it will be useful to others at some point.

There is a hidden flag added to be able to select the early (old) and the late CFG structurizer. There are some littest failures with the AMDGPU tests when running with the new structurizer, which we are working on fixing.

Diff Detail

Repository
rL LLVM

Event Timeline

jsjodin updated this revision to Diff 66970.Aug 5 2016, 9:54 AM
jsjodin retitled this revision from to Machine Level CFG Structurizer.
jsjodin updated this object.
jsjodin added reviewers: tstellarAMD, arsenm.

I see some commented out code that I missed to remove. I will fix that asap.

jsjodin updated this revision to Diff 66974.Aug 5 2016, 10:28 AM

Remove commented out code.

jsjodin updated this revision to Diff 67570.Aug 10 2016, 12:02 PM

Updated patch to use better debug locations when instructions are created.

arsenm edited edge metadata.Aug 25 2016, 10:10 AM

What effect does this have on what instructions are emitted for the mask modifications? I noticed that sc emits fewer instructions to modify the exec mask. For example, we currently emit for if/then/else

if:
s_and_saveexec_b64 SaveExec, vcc
s_xor_b64 SaveExec, exec, SaveExec

else:
s_or_saveexec_b64 SaveExec, SaveExec
s_xor_b64 exec, exec, SaveExec

endif:
s_or_b64 exec, exec, SaveExec

While sc emits:
if:
s_and_saveexec_b64 SaveExec, vcc

else:
s_andn2_b64, exec, SaveExec, exec

endif:
s_mov_b64 exec, SaveExec

Switching this seems easy, but I tried with the current scheme, but I think it requires modification of how if.break and else.break are inserted

lib/Target/AMDGPU/SIInstrInfo.h
137 ↗(On Diff #67570)

Don't need virtual

jsjodin updated this revision to Diff 73807.Oct 6 2016, 9:25 AM
jsjodin edited edge metadata.

Fixed various bugs related to maintaining live out information. Fixed merge issues.

jsjodin updated this revision to Diff 73815.Oct 6 2016, 10:09 AM
jsjodin edited edge metadata.

Remove virtual

jsjodin marked an inline comment as done.EditedOct 6 2016, 10:13 AM

What effect does this have on what instructions are emitted for the mask modifications? I noticed that sc emits fewer instructions to modify the exec mask. For example, we currently emit for if/then/else

if:
s_and_saveexec_b64 SaveExec, vcc
s_xor_b64 SaveExec, exec, SaveExec

else:
s_or_saveexec_b64 SaveExec, SaveExec
s_xor_b64 exec, exec, SaveExec

endif:
s_or_b64 exec, exec, SaveExec

While sc emits:
if:
s_and_saveexec_b64 SaveExec, vcc

else:
s_andn2_b64, exec, SaveExec, exec

endif:
s_mov_b64 exec, SaveExec

Switching this seems easy, but I tried with the current scheme, but I think it requires modification of how if.break and else.break are inserted

I think that should be okay, since we don't emit if.break or else. break with the new structurizer. I can try it out and see what happens.

Actually we do emit if.break because it is used to get out of loops. I'm not sure if there is a better scheme for this.

jsjodin updated this revision to Diff 78786.Nov 21 2016, 2:33 PM
jsjodin edited edge metadata.

Fixed various bugs. The code now transforms DAGs, Loops and nested regions, and the generated code executes correctly, for some small examples.

jsjodin updated this revision to Diff 79604.Nov 29 2016, 10:44 AM
jsjodin edited edge metadata.

Fix insertSelect to handle VCCZ and VCCNZ conditions.

Please clang-format this.

arsenm added inline comments.Dec 7 2016, 1:31 PM
lib/Target/AMDGPU/SIInstrInfo.cpp
598–604 ↗(On Diff #79604)

I have a patch that more comprehensively implements insertSelect in D26101

jsjodin updated this revision to Diff 80773.Dec 8 2016, 9:27 AM
jsjodin edited edge metadata.

Cleaned up the code a bit and and clang-formatted everything.

jsjodin marked an inline comment as done.Dec 8 2016, 9:30 AM
jsjodin added inline comments.
lib/Target/AMDGPU/SIInstrInfo.cpp
598–604 ↗(On Diff #79604)

Good! Are you planning to submit this soon? I guess any reviewer can take note of this and ignore this code.

Revert back to "Not Done", accidentally checked the box.

jsjodin updated this revision to Diff 80947.Dec 9 2016, 1:35 PM
jsjodin edited edge metadata.

Added function to only run the structurizer on a region if there are non-uniform branch instructions present. The interface function should perhaps be called something more generic. Also synced with trunk.

jsjodin updated this revision to Diff 91067.Mar 8 2017, 1:50 PM

Moved the code into the AMDGPU backend since there was little interest in a generic version. If someone wants to use this in the future, it is easy to lift it out into CodeGen again.

jsjodin added inline comments.Mar 27 2017, 11:19 AM
lib/Target/AMDGPU/SIInstrInfo.cpp
598–604 ↗(On Diff #79604)

I changed this function to insertVectorSelect. We can merge the two functions later on. I don't want this patch to affect anything.

arsenm added inline comments.Apr 4 2017, 5:15 PM
lib/CodeGen/MachineRegionInfo.cpp
108 ↗(On Diff #91067)

Adding commented out code

126 ↗(On Diff #91067)

This should be committed separately

lib/Target/AMDGPU/SIInstrInfo.cpp
464–466 ↗(On Diff #91067)

This should just insert copies?

479 ↗(On Diff #91067)

This will never see a VReg_1

1055 ↗(On Diff #91067)

succ_empty

598–604 ↗(On Diff #79604)

I committed the patch for insertSelect already

lib/Target/AMDGPU/SIInstructions.td
174 ↗(On Diff #91067)

Does this also need to setDef?

175 ↗(On Diff #91067)

There is a CFPseudoInst which will handle the use/def of EXEC for you

jsjodin planned changes to this revision.Apr 10 2017, 10:42 AM
jsjodin added inline comments.
lib/CodeGen/MachineRegionInfo.cpp
108 ↗(On Diff #91067)

Will fix.

126 ↗(On Diff #91067)

I will split the MachineRegionInfo changes into a separate patch.

126 ↗(On Diff #91067)

I will split the MachineRegionInfoPass changes to a separate patch.

lib/Target/AMDGPU/SIInstrInfo.cpp
464–466 ↗(On Diff #91067)

It should just set a register to a specific value.

479 ↗(On Diff #91067)

I will remove that case.

1055 ↗(On Diff #91067)

Will fix.

598–604 ↗(On Diff #79604)

I called this function insertVectorSelect since it only deals with vector ops, and I think insertSelect only does scalar at the moment. I'd prefer no overlap for now since this is only used by the structurizer. It could be merged later.

lib/Target/AMDGPU/SIInstructions.td
174 ↗(On Diff #91067)

I don't know. What do you think?

175 ↗(On Diff #91067)

So what should the change be instead? Is this instruction a duplicate of the other instruction?

jsjodin updated this revision to Diff 94861.Apr 11 2017, 11:18 AM
jsjodin marked 11 inline comments as done.

Removed the MachineRegion changes and submitted in a different patch. Fixed a few of the comments made by Matt.

Also fixed succ_empty.

arsenm added inline comments.Apr 21 2017, 4:13 PM
lib/Target/AMDGPU/SIInstrInfo.cpp
494 ↗(On Diff #94861)

This is really materializeImmediate or something like that

495 ↗(On Diff #94861)

const DebugLoc &

496 ↗(On Diff #94861)

Value should be int64_t

502 ↗(On Diff #94861)

addImms on next lines

508–512 ↗(On Diff #94861)

Do you actually need to handle all of these large register classes?

726 ↗(On Diff #94861)

2 space indent

1083 ↗(On Diff #94861)

assert(Info->isEntryFunction())

4219 ↗(On Diff #94861)

() not needed around MF

arsenm added inline comments.Apr 21 2017, 4:16 PM
lib/Target/AMDGPU/SIInstructions.td
174 ↗(On Diff #91067)

It's expanded into instructions that do, so it should have it as a def too

jsjodin updated this revision to Diff 96455.Apr 24 2017, 1:21 PM

Fixed issues raised by Matt.

jsjodin updated this revision to Diff 96461.Apr 24 2017, 1:50 PM

Fixed remaining comments.

jsjodin marked 7 inline comments as done.Apr 24 2017, 1:53 PM
jsjodin added inline comments.
lib/Target/AMDGPU/SIInstrInfo.cpp
508–512 ↗(On Diff #94861)

Yes, I added these as I encountered them in the various tests.

lib/Target/AMDGPU/SIInstructions.td
175 ↗(On Diff #91067)

Okay, now I realized what you meant. I used the CFPseudoInstSI

jsjodin marked 3 inline comments as done.Apr 25 2017, 6:45 AM
arsenm added inline comments.May 2 2017, 10:37 AM
lib/Target/AMDGPU/SIInstrInfo.cpp
1530 ↗(On Diff #96461)

nullptr

1699–1701 ↗(On Diff #96461)

No return after else

598–604 ↗(On Diff #79604)

insertSelect should handle an arbitrary number of elements

lib/Target/AMDGPU/SIInstrInfo.h
156 ↗(On Diff #96461)

const DebugLoc &

160 ↗(On Diff #96461)

const DebugLoc &

163–168 ↗(On Diff #96461)

Accidentally added this?

785–789 ↗(On Diff #96461)

These don't need to be virtual anymore?

jsjodin updated this revision to Diff 98313.May 9 2017, 9:49 AM

Fixed issues raised by Matt.

jsjodin marked 6 inline comments as done.May 9 2017, 9:52 AM
jsjodin added inline comments.
lib/Target/AMDGPU/SIInstrInfo.cpp
598–604 ↗(On Diff #79604)

Sure, insertSelect does, but this function is not part of the InstrInfo interface. When the functions are merged this can be fixed.

arsenm added inline comments.May 9 2017, 7:16 PM
lib/Target/AMDGPU/SIInstrInfo.cpp
605 ↗(On Diff #98313)

This can still be condensed similarly to how copyPhysReg does it

jsjodin updated this revision to Diff 98785.May 12 2017, 9:22 AM
jsjodin marked an inline comment as done.

Made the implementation of materializeImmediate more compact as Matt suggested.

jsjodin marked an inline comment as done.May 12 2017, 9:23 AM
arsenm accepted this revision.May 15 2017, 2:27 AM

LGTM

This revision is now accepted and ready to land.May 15 2017, 2:27 AM
This revision was automatically updated to reflect the committed changes.