Page MenuHomePhabricator

[DEBUG INFO] Fixing cases where debug info (-g) causes changes in the program.
Needs ReviewPublic

Authored by jonpa on Apr 20 2018, 6:19 AM.

Details

Summary

I tried -enable-ipra on SPEC (SystemZ) and got three benchmark crashes, which disappeared with -g. The explanation I got on this is that if debug-info changes the program, it is a bug. Optimizers should ignore debug-info. Indeed, when looking into this, I found many cases where debug info instructions (/intrinsics) affect the optimizers because they forgot to e.g. iterate past a debug-instruction (which uses a register).

I have gotten to a point where my original crash with -enable-ipra can be compiled with -g and debugged (identical text sections). There are still more cases (~10 files) when building SPEC-2006 with -O3 with and w/out -g, that do not have identical output with objdump -d on object files. I have so far handled only cases that came up, except the SystemZ CondTrap which was handled on the fly, when handling the BRCT conversion.

I am not 100% sure what is expected for DBG_VALUEs or (llvm.dbg.value intrinsics), so I have merely begun by mimicking a handling I found in MachineSink.cpp. Since I didn't know where to add this as a utility function, I have now a lot of local copies of that function, which of course should be fixed.

I have assumed:

  1. If an instruction is moved, then the immediately following DBG_VALUE(s) that read the defined register is also moved with it.
  1. If an instruction is erased, those DBG_VALUE(s) would also be erased.

So far, I have found use for these new utility functions: collectDebugValues, moveDebugValues, eraseDebugValues. I would like to put these in as utilities somewhere, and use them everywhere. The caller should own the DbgValues vector, I think. Is there a file I could put this in, or perhaps it could be part of MachineInstr/MachineBasicBlock?

TODO:

  1. utility functions instead of copied code. Reuse also in MachineSink.cpp (both pre/post RA).
  2. tests: Can we test explicitly that -g does not alter the text section, like via objdump two times (with and w/out -g) into temporary files, or do I need to make clever tests that checks the specific codegen that I found?
  3. This seems wildly neglected (as it is in gcc, I hear). What could we do?

Diff Detail

Event Timeline

jonpa created this revision.Apr 20 2018, 6:19 AM

Could you please split this up into one patch per functional change and upload each of them with a MIR testcase? If you don't have them already you can probably just take an existing IR testcase and run the debugify pass on it and then llc it with -stop-after-regalloc it to synthesize a testcase.

lib/CodeGen/MachineCopyPropagation.cpp
395

Please don't repeat the function name in the comment.

400

Please comment why this early exit is necessary.

403

Please clang-format.

407

For a function that is called collectDebugValues it looks suspicious that this is a return instead of a continue. Could you please add a comment explaining why this is correct?

lib/CodeGen/MachineSink.cpp
1166

// Collect ...

lib/Target/SystemZ/SystemZElimCompare.cpp
190

see above

192

This looks like the same function as above. Factor it out into a general utility?

256

Please comment why they are being skipped and and perhaps use isMetaInstruction() instead?

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
291

isMetaInstruction?

jonpa marked an inline comment as done.Apr 20 2018, 8:51 AM
jonpa added inline comments.
lib/Target/SystemZ/SystemZElimCompare.cpp
192

Yes, as I wrote in the description, this is my plan. Please help me with where you think that function should be placed.

Could also you please provide a full context diff when you update this differential?

gberry added a subscriber: gberry.Apr 26 2018, 7:33 AM

I've been looking into this issue for AArch64 recently as well (see bug 37240 for a PostRA scheduler issue I found). I'll be filing more bugs and/or posting fixes for other cases that we hit on our target with our different compiler flags in the next few days.

As far as testing, I think it would be better to have bots set up that do with/without -g compile and objdump diff. Writing lit tests that check for the specific bugs fixed is probably a good idea too, but those seem much less useful since the likelihood of the same bug turning up again is relatively low?

lib/Target/SystemZ/SystemZElimCompare.cpp
192

Presumably in the MachineInstruction and MachineBlock classes, respectively?

lib/Transforms/IPO/FunctionAttrs.cpp
1276

Perhaps this would be better handled by marking the debug intrinsics as 'norecurse'?

jonpa added a comment.Apr 28 2018, 2:09 AM

I've been looking into this issue for AArch64 recently as well (see bug 37240 for a PostRA scheduler issue I found). I'll be filing more bugs and/or posting fixes for other cases that we hit on our target with our different compiler flags in the next few days.

As far as testing, I think it would be better to have bots set up that do with/without -g compile and objdump diff. Writing lit tests that check for the specific bugs fixed is probably a good idea too, but those seem much less useful since the likelihood of the same bug turning up again is relatively low?

Having a bot that checks this with objdump diff sounds very useful to me.

I started to write the utility functions in MachineInstr/MachineBasicBlock, but thinking about it, I wonder if it wouldn't be even better to integrate fully the handling of DBG_VALUEs and DebugLocs into splice()? After all, the DBG_VALUEs should *always* be (re)moved together with the MachineInstr, so why expose this to a particular optimizer pass? If splice() would do this per default, the optimizer would not have to worry about it, and could not even forget about it. Is there a reason to always do this on the side?

If this is guarded by a check to see if the following instruction is a DBG_VALUE, I think builds without debug info would not suffer from increased compile-time. BTW, is there a better way to check if it is a debug build or not, like maybe hasDebugInfo()?

One issue with this is that the caller usually has an iterator, that now gets invalidated if a DBG_VALUE is moved automatically. I think this means splice() has to return an iterator to the next following instruction after the moved one(s). There would also have to be one method that handled reverse iterators. My idea is that splice would only do this if an iterator is passed as an optional argument, so that optimizers could gradually get used to this.

Again, I am not an expert here, just assuming from quick observation that when moving an instruction it is good to

  1. move any connected DBG_VALUEs with the moved MI
  2. update the DebugLoc of MI for its new position.

I have so far just reused the code in MachineSink, but I don't know how good this is or if it could be improved. MachineSink is currently checking through all DBG_VALUEs after MI and picking only the right ones, and stopping at first non-debug MI. This makes sense to me, even though I'm not really clear on why the DBG_VALUEs could end up mixed like that. Could it perhaps be that if we started handling this properly everywhere, we could assume that they are never mixed (that DBG_VALUE that express a use of a register is always to be found directly after the defining MI)? Does this relate perhaps to instructions with multiple defs (like an address post-increment), and if so, should we perhaps scan MI for all explicit defs instead of just the one at index 0?

Of course, Pre-RA this could be done with MRI use-def information instead of the search.

Then there is the question of what to do when moving a range of instructions. Depending on how mixed up the DBG_VALUEs might get (when all optimizers are doing the right thing), one might want to either handle the whole range of instructions to find out what following DBG_VALUEs to also move, or perhaps just look at the last instruction of the range.

Similarly, eraseFromParent() could always also erase any connected DBG_VALUes. There is actually eraseFromParentAndMarkDBGValuesForRemoval, but only with virtual registers, sofar. Should the search for DBG_VALUES after MI be added here to be done post-RA?

I would like to see some clear comment on exactly what is expected from an optimization pass that moves / erases instructions in regards to DBG_VALUE instructions, but can't find one...?

Do you think this makes sense, or should we stick with separate utility functions?

mattd added a subscriber: mattd.Apr 28 2018, 8:24 AM
mattd added a comment.Apr 28 2018, 8:32 AM

Hi jonpa, thanks for working on all of these cases. I had a similar fix for MachineSink.cpp; see https://reviews.llvm.org/D45637 for details. Feel free to either lift that code or I can commit it separately, pending approval. I want to avoid duplicating effort, as I already have an accompany test for that logic as well. My solution creates a routine around the logic for sinking DBG_VALUE instructions, as that same logic is also used in another routine in that file.

jonpa added a comment.Apr 28 2018, 9:24 AM

Hi jonpa, thanks for working on all of these cases. I had a similar fix for MachineSink.cpp; see https://reviews.llvm.org/D45637 for details. Feel free to either lift that code or I can commit it separately, pending approval. I want to avoid duplicating effort, as I already have an accompany test for that logic as well. My solution creates a routine around the logic for sinking DBG_VALUE instructions, as that same logic is also used in another routine in that file.

Interesting, I see you fixed the same post-RA issue as I found :-)

I don't really have an opinion, your patch looks nice to me, except I wonder what you think about moving this out of MachineSink into splice() per my discussion?

There is also the case where an instruction is erased to then be replaced by another one, possibly at another location. In that case it would be needed to have the utility functions available outside of splice(). So it is not true that splice() can handle it all, but it is still interesting to me to let it do the work if the caller says so...

I started to write the utility functions in MachineInstr/MachineBasicBlock, but thinking about it, I wonder if it wouldn't be even better to integrate fully the handling of DBG_VALUEs and DebugLocs into splice()? After all, the DBG_VALUEs should *always* be (re)moved together with the MachineInstr, so why expose this to a particular optimizer pass? If splice() would do this per default, the optimizer would not have to worry about it, and could not even forget about it. Is there a reason to always do this on the side?

If this is guarded by a check to see if the following instruction is a DBG_VALUE, I think builds without debug info would not suffer from increased compile-time. BTW, is there a better way to check if it is a debug build or not, like maybe hasDebugInfo()?

One issue with this is that the caller usually has an iterator, that now gets invalidated if a DBG_VALUE is moved automatically. I think this means splice() has to return an iterator to the next following instruction after the moved one(s). There would also have to be one method that handled reverse iterators. My idea is that splice would only do this if an iterator is passed as an optional argument, so that optimizers could gradually get used to this.

Again, I am not an expert here, just assuming from quick observation that when moving an instruction it is good to

  1. move any connected DBG_VALUEs with the moved MI
  2. update the DebugLoc of MI for its new position.

I have so far just reused the code in MachineSink, but I don't know how good this is or if it could be improved. MachineSink is currently checking through all DBG_VALUEs after MI and picking only the right ones, and stopping at first non-debug MI. This makes sense to me, even though I'm not really clear on why the DBG_VALUEs could end up mixed like that. Could it perhaps be that if we started handling this properly everywhere, we could assume that they are never mixed (that DBG_VALUE that express a use of a register is always to be found directly after the defining MI)? Does this relate perhaps to instructions with multiple defs (like an address post-increment), and if so, should we perhaps scan MI for all explicit defs instead of just the one at index 0?

Of course, Pre-RA this could be done with MRI use-def information instead of the search.

Then there is the question of what to do when moving a range of instructions. Depending on how mixed up the DBG_VALUEs might get (when all optimizers are doing the right thing), one might want to either handle the whole range of instructions to find out what following DBG_VALUEs to also move, or perhaps just look at the last instruction of the range.

Similarly, eraseFromParent() could always also erase any connected DBG_VALUes. There is actually eraseFromParentAndMarkDBGValuesForRemoval, but only with virtual registers, sofar. Should the search for DBG_VALUES after MI be added here to be done post-RA?

I would like to see some clear comment on exactly what is expected from an optimization pass that moves / erases instructions in regards to DBG_VALUE instructions, but can't find one...?

Do you think this makes sense, or should we stick with separate utility functions?

You should probably bring this issue up on llvm-dev to get more feedback from the community.

bjope added a subscriber: bjope.Apr 30 2018, 7:50 AM
bjope added inline comments.
lib/Target/SystemZ/SystemZElimCompare.cpp
195

Shouldn't we iterate over all MI defs (both explicit and implicit)?
If there is a reason for limiting this to check some operands, then perhaps this check should be on the caller side (and then the register could be passed as an argument).

aprantl added a subscriber: davide.Apr 30 2018, 2:46 PM
davide added a subscriber: zhendongsu.EditedApr 30 2018, 2:58 PM

FWIW, I hacked up a fuzzer which found already around 20 (possibly with duplicate) changes in generated code due to -g being passed around. (cc: @zhendongsu)
It was prodded by the bug Geoff reported on the SLPVectorizer which has been fixed on Friday.
I can try to reduce the issues reported and also try to fix them.
About this change, I tend to agree with Adrian it should be split into multiple/separate patches.
Also, every change should have a test case attached.

FWIW, I hacked up a fuzzer which found already around 20 (possibly with duplicate) changes in generated code due to -g being passed around. (cc: @zhendongsu)

There are lots of minor cases. The .cfi_* directives being a scheduling barrier creates a lot of noise in such testing (PR37240).

It was prodded by the bug Geoff reported on the SLPVectorizer which has been fixed on Friday.
I can try to reduce the issues reported and also try to fix them.

@davide can you coordinate with @vsk and the GSOC intern on these?

jonpa added inline comments.May 4 2018, 2:52 AM
lib/Target/SystemZ/SystemZElimCompare.cpp
195

This was the behaviour I found in MachineSink.cpp, which I guess kind of works generally.

I agree with you that it seems right to do a best-effort check since it will not be done (cause compile time regression) without -g.

vsk added a subscriber: gramanas.May 7 2018, 1:08 PM

FWIW, I hacked up a fuzzer which found already around 20 (possibly with duplicate) changes in generated code due to -g being passed around. (cc: @zhendongsu)

There are lots of minor cases. The .cfi_* directives being a scheduling barrier creates a lot of noise in such testing (PR37240).

It was prodded by the bug Geoff reported on the SLPVectorizer which has been fixed on Friday.
I can try to reduce the issues reported and also try to fix them.

@davide can you coordinate with @vsk and the GSOC intern on these?

+ @gramanas. Anastasis is currently working on getting up to speed with LLVM IR and debug info handling in the SROA pass. We can certainly explore tackling some of these issues as a part of his project. I've asked @jonpa to re-send his notes to llvm-dev so we can continue the discussion there.