Page MenuHomePhabricator

Insert two S_NOP instructions for every high level source statement.
ClosedPublic

Authored by kzhuravl on Feb 19 2016, 9:12 AM.

Details

Summary

Tools, such as debugger, need to pause execution based on user input (i.e. breakpoint). In order to do this, two S_NOP instructions are inserted for each high level source statement: one before first isa instruction of high level source statement, and one after last isa instruction of high level source statement. Further, debugger may replace S_NOP instructions with S_TRAP instructions based on user input.

Diff Detail

Event Timeline

kzhuravl updated this revision to Diff 48502.Feb 19 2016, 9:12 AM
kzhuravl retitled this revision from to Insert nop for each high level source statement.
kzhuravl updated this object.
kzhuravl added reviewers: tstellarAMD, arsenm.
kzhuravl added a subscriber: llvm-commits.
lib/Target/AMDGPU/AMDGPUInsertNopsPass.cpp
1 ↗(On Diff #48502)

The file name in the header is wrong, but I think the file and the class names should be renamed to use the 'SI' prefix rather than the 'AMDGPU' prefix to be consistent with other GCN only passes.

11–19 ↗(On Diff #48502)

Why do we need two passes. Can't we just insert the S_NOP instructions in the first pass?

122 ↗(On Diff #48502)

Style comment. We usually use MBB as a variable name when iterating over blocks and MI when iterating over Machine Instructions. This will make it more obvious what the code is doing.

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
269

We should also be running this pass if the user specifies -g.

270

This should be added to GCNPassConfig since it is GCN only.

One other question. Is there some reason why the compiler can't add the s_trap instructions instead of having the debugger add them.

tools want to selectively substitute S_NOPs with S_TRAPs based on the user input

lib/Target/AMDGPU/AMDGPUInsertNopsPass.cpp
1 ↗(On Diff #48502)

ok

11–19 ↗(On Diff #48502)

this should work with different optimization levels. for o0 one pass works fine. in other opt levels instructions are reordered at different compilation stages. first pass inserts DEBUG_NOP pseudo instructions before register allocation. DEBUG_NOP pseudo instruction has isTerminator attribute, which makes reordering across DEBUG_NOPs not possible. second pass lowers DEBUG_NOPs to S_NOPs right before machine code is emitted.

122 ↗(On Diff #48502)

ok

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
269

tools team specifically asked not to do this. for example, for profiling they want to run it with -g, but without inserting nops

270

ok

lib/Target/AMDGPU/AMDGPUInsertNopsPass.cpp
11–19 ↗(On Diff #48502)

By the time we get to running the first pass, the code will have already been re-ordered by the LLVM IR passes as well as the SelectionDAG. We also can't insert instructions with terminators in the middle of blocks, because this will break other passes (and the verifier).

Can we start with one pass and if the result isn't good enough then maybe look for other solutions?

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
269

Ok, so the nops are required for both debugging and profiling? Is the command line flag you added accessible from clang?

kzhuravl updated this revision to Diff 48625.Feb 21 2016, 9:26 AM
kzhuravl retitled this revision from Insert nop for each high level source statement to Insert two S_NOP instructions for every high level source statement..
kzhuravl updated this object.
kzhuravl edited edge metadata.

After discussion with Tools, it was decided to insert two S_NOPs for each high level source statement, this way we do not have to disable any optimizations in non-O0 opt levels

kzhuravl marked 12 inline comments as done.Feb 21 2016, 9:29 AM
kzhuravl added inline comments.
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
269

nops are only required for debugging. for profiling they want to have debug info, but no nops. command line flag is accessible from clang

kzhuravl marked an inline comment as done.Feb 21 2016, 9:34 AM

Tom's feedback

lib/Target/AMDGPU/AMDGPUInsertNopsPass.cpp
11–19 ↗(On Diff #48502)

After discussion with Tools, it was decided to insert two S_NOPs for each high level source statement, this way we do not have to disable any optimizations in non-O0 opt levels. One S_NOP is inserted before first isa instruction of high level source stmt and after last isa instruction of high level source stmt. Updated the diff which includes one pass

kzhuravl updated this revision to Diff 48628.Feb 21 2016, 10:07 AM

Minor fixes

arsenm edited edge metadata.Feb 22 2016, 12:12 PM

This needs some tests

lib/Target/AMDGPU/SIInsertNopsPass.cpp
23

I don't think this is needed

48–49

If you don't need to initialize dependencies, you can use INITIALIZE_PASS without _BEGIN/_END

62

I think this needs a better name. LineToInst?

65

Can you invert the condition and continue to reduce indentation

70

addImm should be on next line

78–81

I think the use of auto and second here is hard to follow. ->second should be assigned to a variable, and no auto

81

++MI should be new line

86

I would prefer having an explicit MBB variable instead of using the multiple fronds here

kzhuravl marked 8 inline comments as done.Feb 22 2016, 1:00 PM
kzhuravl updated this revision to Diff 48721.Feb 22 2016, 1:19 PM
kzhuravl edited edge metadata.

Matt's feedback

This revision was automatically updated to reflect the committed changes.