Page MenuHomePhabricator

[Debugify] Support checking Machine IR debug info
AcceptedPublic

Authored by xiangzhangllvm on Tue, Nov 17, 12:58 AM.

Details

Summary

Add mir-check-debug pass to check MIR-level debug info.

For IR-level, currently, LLVM have debugify + check-debugify to generate and check debug IR.
Much like the IR-level pass debugifymir-debugify inserts sequentially increasing line locations to each MachineInstr in a Module,
But there is no equivalent MIR-level check-debugify pass, So now we support it at "mir-check-debug".

Currently (our first step), what mir-check-debug do is similar with check-debugify, to check the integrity of line info in DILocation and DILocalVariable
which mir-debugify generated before.

The 3 tests in patch show how this tools work.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Herald added a project: Restricted Project. · View Herald TranscriptTue, Nov 17, 12:58 AM
xiangzhangllvm requested review of this revision.Tue, Nov 17, 12:58 AM
xiangzhangllvm edited the summary of this revision. (Show Details)

Thanks for doing this! (some initial comments inline)

Please add some context about this into https://llvm.org/docs/HowToUpdateDebugInfo.html#id14.

llvm/lib/CodeGen/MachineCheckDebug.cpp
97 ↗(On Diff #305666)

clang-format please

llvm/lib/CodeGen/MachineDebugify.cpp
145

clang-format please

149

the comments should go above

llvm/test/CodeGen/Generic/MIRDebugify/check-line-and-variables-x.mir
47

we can get rid of these tbaa metadata

llvm/test/CodeGen/Generic/MIRDebugify/check-line-and-variables.mir
32

I think we do not need all of these attributes, only name and alignment

djtodoro edited reviewers, added: dsanders; removed: dsanders11.
xiangzhangllvm retitled this revision from [Debug] Support checking Machine IR debug info to [Debugify] Support checking Machine IR debug info.
xiangzhangllvm marked 5 inline comments as done.Tue, Nov 17, 5:26 PM

All done, @djtodoro thanks very much for you review !

pengfei added inline comments.Tue, Nov 17, 5:29 PM
llvm/docs/HowToUpdateDebugInfo.rst
346

Remove the right parentheses.

xiangzhangllvm marked an inline comment as done.
xiangzhangllvm added inline comments.
llvm/docs/HowToUpdateDebugInfo.rst
346

Oh! Yes, Thanks a lot !

djtodoro added inline comments.Wed, Nov 18, 1:48 AM
llvm/include/llvm/CodeGen/TargetPassConfig.h
316

please add synthesized debug info, since this checks the debugging info from debugify only (for now)

llvm/lib/CodeGen/CMakeLists.txt
77

I would pick MachineCheckDebugify, wdyt?

llvm/lib/CodeGen/MachineCheckDebug.cpp
58 ↗(On Diff #305941)

I think we should do this for finding missing variables; but for the checking of missing lines we should check more machine instructions (especially the instrs other than debug ones), right? Any reason we shouldn't do that?

Please check the checkDebugifyMetadata() (from Debugify.cpp) how it was done.

xiangzhangllvm marked an inline comment as done.Wed, Nov 18, 4:34 PM
xiangzhangllvm added inline comments.
llvm/lib/CodeGen/CMakeLists.txt
77

Yes, I noticed that, I also found mir-strip-debug implemented in MachineStripDebug.cpp, so I sync it with mir-strip-debug.

llvm/lib/CodeGen/MachineCheckDebug.cpp
58 ↗(On Diff #305941)

Because the MirDebugify will generate DBG_VALUE for every MIRs, all these MIRs' debug location will be saved in these generated DBG_VALUEs. So here we just need to check the line info in these DBG_VALUEs. It is also ok the check the non_debug MIR again, but it is duplicated.

LuoYuanke added inline comments.Wed, Nov 18, 5:37 PM
llvm/lib/CodeGen/MachineCheckDebug.cpp
28 ↗(On Diff #305941)

Why is it struct, not class?

llvm/lib/CodeGen/MachineDebugify.cpp
58

Why terminator instruction don't need debug info?

llvm/test/CodeGen/Generic/MIRDebugify/check-line-and-variables-x.mir
59

Do we need blank character between ';' and 'DBG_VALUE'?

xiangzhangllvm added inline comments.Wed, Nov 18, 6:01 PM
llvm/lib/CodeGen/MachineCheckDebug.cpp
28 ↗(On Diff #305941)

Struct is similar with "Class", (default to public for its members)
I sync this code from mir-strip-debug pass.

llvm/lib/CodeGen/MachineDebugify.cpp
58

because later LiveDebugVariables will drops out-of-liveness DBG_VALUEs, as the register allocator will not maintain the location of its vreg for positions that aren't live.
Pls refer https://github.com/llvm/llvm-project/blob/68ac02c0dd2b8fda52ac132a86f72f2ad6b139a5/llvm/lib/CodeGen/LiveDebugVariables.cpp#L657

xiangzhangllvm added inline comments.Wed, Nov 18, 9:39 PM
llvm/lib/CodeGen/MachineCheckDebug.cpp
58 ↗(On Diff #305941)

Hi, @djtodoro
Say it is "duplicated" may not accurate, here we only check the debug location in DBG_VALUE MIRs is more strict than check both of them. Because the optimizations may change/update the value of DBG_VALUE (its 1st operand), but they shouldn't change/remove the location info of it. And directly remove a DBG_VALUE should always report warning.

xiangzhangllvm marked 4 inline comments as done.Wed, Nov 18, 9:45 PM
xiangzhangllvm added inline comments.
llvm/lib/CodeGen/CMakeLists.txt
77

If you still prefer MachineCheckDebugify, I am ok to update it, include the option name.

djtodoro added inline comments.Thu, Nov 19, 5:59 AM
llvm/lib/CodeGen/CMakeLists.txt
77

But there is still MachineDebugify.cpp, so I think MachineCheckDebugify.cpp is more appropriate name.

llvm/lib/CodeGen/MachineCheckDebug.cpp
58 ↗(On Diff #305941)

On the IR level, debugify utility pass can tell us: there is an instruction I with empty DebugLoc after a pass. How it works: before a pass, it attaches synthetic dbg loc onto each IR instruction, and after the pass it checks if it preserved the dbg loc.

Can we do the same for MIR Debugify? For example: the mir-debugify attaches a dbgLoc on an MI (non debug one), and then the mir-check-debigify checks if a pass preserved that dbg location? Is it even possible at MIR level?

jmorse added inline comments.Thu, Nov 19, 6:52 AM
llvm/lib/CodeGen/MachineCheckDebug.cpp
58 ↗(On Diff #305941)

I agree with Djordje, the line numbers attached to DBG_VALUE instructions are of little interest and never reach an output file, it's the non-meta instructions that are the subject of optimisations, and thus should be the target of line number checkers.

llvm/lib/CodeGen/MachineDebugify.cpp
58

Terminators probably shouldn't have variable location information attached to them. However this block is adding line numbers / source locations to instructions, and terminator line numbers can be useful (for example tail calls, or representing the end of a lexical block).

114–115

If I'm reading this correctly, won't NumLines always finish with the value NextLine-1, the counter used when setting line numbers above??

118

Couldn't this just be extracted from the Line2Var collection built earlier?

138
139
MaskRay added inline comments.Thu, Nov 19, 2:21 PM
llvm/test/CodeGen/Generic/MIRDebugify/check-line-and-variables.ll
2

Does this file add new coverage?

This file explain its purpose in the file header. The body of foo is mostly unused, you can add just a ret. If you don't otherwise check the output, use -filetype=null

llvm/lib/CodeGen/MachineCheckDebug.cpp
58 ↗(On Diff #305941)

Hello @jmorse @djtodoro, first thanks for your comments, I was thinking about it before, and it is nice discuss with you now:
What we really want (MIR/IR)check-debug doing for us ?
Take (IR)check-debug for e.g.
In my eyes, we use check-debug to check if the optimizations incorrectly optimized the debug info of IRs. Right?
Some optimizations may correctly optimized the IRs by removing them with carefully handling its debug info. (e.g. --instsimplify), for these cases, I think, check-debug should not report WARNING messages out.

Take llvm lit test llvm/test/DebugInfo/debugify.ll for e.g.
There is unused %sum=... in it, let's remove it in instruction simplify.
After debugify, it will be:

define i32 @bar() !dbg !12 {
  call void @foo(), !dbg !15
  %sum = add i32 0, 1, !dbg !16
  call void @llvm.dbg.value(metadata i32 %sum, metadata !14, metadata !DIExpression()), !dbg !16
  ret i32 0, !dbg !17
}

If I run

opt -debugify --instsimplify -check-debugify -S -o - DebugInfo/debugify.ll

it will report WARNING in -check-debugify:

WARNING: Missing line 3
CheckModuleDebugify: PASS
......
define i32 @bar() !dbg !12 {
  call void @foo(), !dbg !15
  call void @llvm.dbg.value(metadata i32 0, metadata !14, metadata !DIExpression(DW_OP_plus_uconst, 1, DW_OP_stack_value)), !dbg !16
  ret i32 0, !dbg !17
}

because the check-debug will check the line info of removed instruction.
What is more, removing instruction is very common in optimization passes, and debugify will add different line info for every instructions, so once remove any non-debug instructions, it will give out WARNINGs. Even they correctly remove them (by updating its llvm.dbg.value (0/undef, ....).

If the test is big, we may need to check a lot of warnings one by one.

And that was why my current patch didn't check the line info for non-debug MIRs.

llvm/lib/CodeGen/MachineDebugify.cpp
114–115

Here is really risk:
I write it here because NextLine-1 may less than line in DILocation generated at (IR)debugify before.
Line 120 update the line from IR (not MIR).
I'll refine here, tks!

118

Yes, but seems same, and doesn't it is better here? -- the life range of Varset is short.

138

I'll fix them, thank you!

llvm/test/CodeGen/Generic/MIRDebugify/check-line-and-variables.ll
2

Terminator IR ret is too simple to check the debug info.
Currently I check two MIR pass will pass this debug info check.
And I think it is better for this *.ll file to keep "same" with its MIR file check-line-and-variables.mir.
tks!

djtodoro added inline comments.Fri, Nov 20, 12:53 AM
llvm/lib/CodeGen/MachineCheckDebug.cpp
58 ↗(On Diff #305941)

Yes, but that is why it is being reported as warning...
Use case: For example, we use debugify and realize that a bunch of add instructions don't have a dbg location attached after a pass X - we investigate pass X, and find out that there is a creating of the instruction add, but we have missed to add I->setDebugLoc(). It makes sense when dealing with non-dbg instructions only, since the locations of these are used during debugging sessions, since only these instructions end up in the final code.

Please consider the code from trunk (IR Debugify.cpp):

// Find missing lines.
for (Instruction &I : instructions(F)) {
  if (isa<DbgValueInst>(&I) || isa<PHINode>(&I))
    continue;

  auto DL = I.getDebugLoc();
  if (DL && DL.getLine() != 0) {
    MissingLines.reset(DL.getLine() - 1);
    continue;
  }

  if (!DL) {
    dbg() << "WARNING: Instruction with empty DebugLoc in function ";
    dbg() << F.getName() << " --";
    I.print(dbg());
    dbg() << "\n";
  }
}

This gives us a chance to investigate if a pass drops the location accidentally or it should have done it, but only for non-debug ones, as:

if (isa<DbgValueInst>(&I) || isa<PHINode>(&I))
  continue;
xiangzhangllvm added inline comments.Fri, Nov 20, 1:03 AM
llvm/lib/CodeGen/MachineCheckDebug.cpp
58 ↗(On Diff #305941)

Thanks for the explanation, I'll check the non-debug MIR.

xiangzhangllvm marked 6 inline comments as done.Fri, Nov 20, 7:56 PM

Thanks!

llvm/lib/CodeGen/MachineCheckDebugify.cpp
57

I'm not against adding two loops -- one for finding missing lines, and the other one for finding missing variables.

59

Not sure we want to report missing dbgLoc for meta instructions other than DBG_VALUE (e.g., DBG_LABEL does not end up in the final code as well..)

68

Early continue?

xiangzhangllvm added inline comments.Mon, Nov 23, 4:35 PM
llvm/lib/CodeGen/MachineCheckDebugify.cpp
57

Yes, two loops is clear

59

For missing line, we should check all the MIRs which contribute the line number in MachineDebugify -- (non DBG_VALUEs), so here we just need to check MI.isDebugValue().

68

Still need to do line 73, I'll let it clear by using 2 loops soon. thank you!

72

Duplicated "\n" with MI.print(), remove here.

xiangzhangllvm added a reviewer: djtodoro.
xiangzhangllvm marked 3 inline comments as done.

Nice! (few nits inline)

llvm/lib/CodeGen/MachineCheckDebugify.cpp
10

checks

11

checks

59

I see, but that should be fixed at MachineDebugify side first. At least, we can put a TODO (or FIXME) marker, saying "Avoid meta instructions other than dbg_val"

66

Nit: please add \n here.

72

Nit: please add \n here.

xiangzhangllvm marked 5 inline comments as done.Tue, Nov 24, 4:34 PM

Thanks very much for your reviews!!

Thanks for addressing the comments! We are almost there.

llvm/docs/HowToUpdateDebugInfo.rst
412

The name of option -debugify-and-check-strip-all-safe seems a bit confusing ...

llvm/lib/CodeGen/MachineCheckDebugify.cpp
53–56

We don't need MaybeMF,

MachineFunction *MF = MMI.getMachineFunction(F);
if (!MF)
  continue;

and use *MF.

llvm/lib/CodeGen/TargetPassConfig.cpp
126

The description should be more precise.

xiangzhangllvm added inline comments.Wed, Nov 25, 9:30 PM
llvm/lib/CodeGen/MachineCheckDebugify.cpp
53–56

Yes, I'll refine here, thank you again : )

llvm/lib/CodeGen/TargetPassConfig.cpp
126

The name and description of "debugify-and-check-strip-all-safe" refer to upper "debugify-and-strip-all-safe", Maybe It is a little different for our non-English speaking people to name or descript well. Do you have good suggestion of its name ?
How about I refine the description to "Debugify MIR before each pass except those known to be unsafe when debug info is present, then run mir-check-debugify for them, and finally strip the debug info with mir-strip-debug" ?

djtodoro added inline comments.Wed, Nov 25, 11:49 PM
llvm/lib/CodeGen/TargetPassConfig.cpp
126

The name and description of "debugify-and-check-strip-all-safe" refer to upper "debugify-and-strip-all-safe"

I think "debugify-check-and-strip-all-safe" would be more appropriate, wdyt?

(DebugifyAndCheckStripAll should be DebugifyCheckAndStripAll)

How about I refine the description to "Debugify MIR before each pass except those known to be unsafe when debug info is present, then run mir-check-debugify for them, and finally strip the debug info with mir-strip-debug" ?

Hmm... we are checking/stripping after each pass right?

"Debugify MIR before, by checking and stripping the debug info after, each pass except those known to be unsafe when debug info is present" ?

xiangzhangllvm added inline comments.Thu, Nov 26, 2:57 AM
llvm/lib/CodeGen/TargetPassConfig.cpp
126

think "debugify-check-and-strip-all-safe" would be more appropriate, wdyt?

Looks good!

Hmm... we are checking/stripping after each pass right?

Yes, same with "debugify-and-strip-all-safe".

"Debugify MIR before, by checking and stripping the debug info after, each pass except those known to be unsafe when debug info is present"

That's better, thank you!!

xiangzhangllvm marked 3 inline comments as done.
djtodoro accepted this revision.Thu, Nov 26, 6:46 AM

LGTM, thanks!

(please leave some time (a few days) to other reviewers to take a look)

llvm/lib/CodeGen/MachineCheckDebugify.cpp
76

Please add a TODO marker for DBG_INSTR_REF as well (it is still under an experimental option, but we should be aware it is coming).

This revision is now accepted and ready to land.Thu, Nov 26, 6:46 AM
xiangzhangllvm added inline comments.Thu, Nov 26, 3:58 PM
llvm/lib/CodeGen/MachineCheckDebugify.cpp
76

OK, I'll mark it when I commit it (next week).
For DBG_INSTR_REF it is a good idea to split the value and location for llvm.dbg.value.
Thanks for all your works !!