Tools, such as debugger, need to pause execution based on user input (i.e. breakpoint). In order to do this, two S_NOP instructions are inserted for each high level source statement: one before first isa instruction of high level source statement, and one after last isa instruction of high level source statement. Further, debugger may replace S_NOP instructions with S_TRAP instructions based on user input.
Details
Diff Detail
Event Timeline
lib/Target/AMDGPU/AMDGPUInsertNopsPass.cpp | ||
---|---|---|
1 ↗ | (On Diff #48502) | The file name in the header is wrong, but I think the file and the class names should be renamed to use the 'SI' prefix rather than the 'AMDGPU' prefix to be consistent with other GCN only passes. |
11–19 ↗ | (On Diff #48502) | Why do we need two passes. Can't we just insert the S_NOP instructions in the first pass? |
122 ↗ | (On Diff #48502) | Style comment. We usually use MBB as a variable name when iterating over blocks and MI when iterating over Machine Instructions. This will make it more obvious what the code is doing. |
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | ||
269 | We should also be running this pass if the user specifies -g. | |
270 | This should be added to GCNPassConfig since it is GCN only. |
One other question. Is there some reason why the compiler can't add the s_trap instructions instead of having the debugger add them.
tools want to selectively substitute S_NOPs with S_TRAPs based on the user input
lib/Target/AMDGPU/AMDGPUInsertNopsPass.cpp | ||
---|---|---|
1 ↗ | (On Diff #48502) | ok |
11–19 ↗ | (On Diff #48502) | this should work with different optimization levels. for o0 one pass works fine. in other opt levels instructions are reordered at different compilation stages. first pass inserts DEBUG_NOP pseudo instructions before register allocation. DEBUG_NOP pseudo instruction has isTerminator attribute, which makes reordering across DEBUG_NOPs not possible. second pass lowers DEBUG_NOPs to S_NOPs right before machine code is emitted. |
122 ↗ | (On Diff #48502) | ok |
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | ||
269 | tools team specifically asked not to do this. for example, for profiling they want to run it with -g, but without inserting nops | |
270 | ok |
lib/Target/AMDGPU/AMDGPUInsertNopsPass.cpp | ||
---|---|---|
11–19 ↗ | (On Diff #48502) | By the time we get to running the first pass, the code will have already been re-ordered by the LLVM IR passes as well as the SelectionDAG. We also can't insert instructions with terminators in the middle of blocks, because this will break other passes (and the verifier). Can we start with one pass and if the result isn't good enough then maybe look for other solutions? |
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | ||
269 | Ok, so the nops are required for both debugging and profiling? Is the command line flag you added accessible from clang? |
After discussion with Tools, it was decided to insert two S_NOPs for each high level source statement, this way we do not have to disable any optimizations in non-O0 opt levels
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | ||
---|---|---|
269 | nops are only required for debugging. for profiling they want to have debug info, but no nops. command line flag is accessible from clang |
Tom's feedback
lib/Target/AMDGPU/AMDGPUInsertNopsPass.cpp | ||
---|---|---|
11–19 ↗ | (On Diff #48502) | After discussion with Tools, it was decided to insert two S_NOPs for each high level source statement, this way we do not have to disable any optimizations in non-O0 opt levels. One S_NOP is inserted before first isa instruction of high level source stmt and after last isa instruction of high level source stmt. Updated the diff which includes one pass |
This needs some tests
lib/Target/AMDGPU/SIInsertNopsPass.cpp | ||
---|---|---|
24 | I don't think this is needed | |
49–50 | If you don't need to initialize dependencies, you can use INITIALIZE_PASS without _BEGIN/_END | |
63 | I think this needs a better name. LineToInst? | |
66 | Can you invert the condition and continue to reduce indentation | |
71 | addImm should be on next line | |
79–82 | I think the use of auto and second here is hard to follow. ->second should be assigned to a variable, and no auto | |
82 | ++MI should be new line | |
87 | I would prefer having an explicit MBB variable instead of using the multiple fronds here |
We should also be running this pass if the user specifies -g.