Page MenuHomePhabricator

[LiveDebugValues] Introduce entry values of unmodified params
AcceptedPublic

Authored by djtodoro on Sep 30 2019, 5:33 AM.

Details

Summary

The idea is to remove front-end analysis for the parameter's value modification and leave it to the value tracking system. Front-end in some cases mark a parameter as modified even the line of code that modifies the parameter gets optimized, that implies that this will cover more entry values even for unmodified parameters (GCC generates entry values in the situations like that, because it performs such analysis within its own value tracking system). In addition, extending the support for modified parameters will be easier with this approach.

Since the goal is to recognize if a parameter’s value has changed, the idea at very high level is: If we encounter a DBG_VALUE other than the entry value one describing the same variable (parameter), we can assume that the variable’s value has changed and we should not track its entry value any more. That would be ideal scenario, but due to various LLVM optimizations, a variable’s value could be just moved around from one register to another (and there will be additional DBG_VALUEs describing the same variable), so we have to recognize such situation (otherwise, we will lose a lot of entry values) and salvage the debug entry value.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Herald added a project: Restricted Project. · View Herald TranscriptSep 30 2019, 5:33 AM

This is very first (kind of a template) patch and it should be tested and improved more, but the main point is that this will remove the performance regression and improve the debug info quality. :)

So we've come full circle :-)
I'm not a fan of lumping arbitrary funcitonality into LiveDebugValues, but if we can keep it readable this is probably the best place for it.

llvm/lib/IR/DebugInfoMetadata.cpp
966 ↗(On Diff #222392)

This appears to be of limited use since many DIExpressions are more complex. Is this meant to detect spill locations? If yes, could we mark them differently when we create them instead of trying to detect them here?

jmorse added a comment.Oct 3 2019, 3:07 AM

Hi Djordje,

Djordje wrote:

[...] due to various LLVM optimizations, a variable’s value could be just moved around from one register to another (and there will be additional DBG_VALUEs describing the same variable), so we have to recognize such situation (otherwise, we will lose a lot of entry values) and salvage the debug entry value.

The chain of patches I have in D67393 / D67398 / D67500 could be useful here -- the DBG_VALUEs that LiveDebugValues inserts for transfers can cause other problems, and in those patches I've attempted to delay their creation until after LiveDebugValues is finished. In particular, from the summary of D67398,

I wrote:

previously VarLocs could effectively build long chains where the MI field referred to the previous transfer DBG_VALUE, eventually rooted in a [non-transfer] DBG_VALUE. Now, however, VarLocs can only ever refer to the [non-transfer] DBG_VALUE, which I think is slightly neater

This means that whenever a transfer is created by transferRegisterCopy, no DBG_VALUE is created, instead a VarLoc is created that points back to the "original" DBG_VALUE. If I understand you correctly (75% confidence), with those patches you would be able to identify whether a location is the entry value by examining the VarLoc's MI field, even if the location has been transferred in the meantime.

djtodoro marked an inline comment as done.Oct 4 2019, 4:59 AM

@aprantl

So we've come full circle :-)

Unfortunately, yes. :)

I'm not a fan of lumping arbitrary funcitonality into LiveDebugValues, but if we can keep it readable this is probably the best place for it.

+1

@jmorse

Hi Jeremy,

I think as well the set of patches you applied could be useful here, I see 2/3 are already committed and I will rebase this on top of that. :)

If I understand you correctly (75% confidence), with those patches you would be able to identify whether a location is the entry value by examining the VarLoc's MI field, even if the location has been transferred in the meantime.

Yes, that is the goal.

llvm/lib/IR/DebugInfoMetadata.cpp
966 ↗(On Diff #222392)

I am aware of that, we should definitely make it more useful somehow.

The point is that this is meant to be used to detect modification of a parameter that could be expressed in terms of its entry value. Up to this point we supported only unmodified parameters, but at this point we want to extend it and try to salvage entry values even for modified parameters (as GCC does). The problem is that we are analyzing only DBG_VALUEs in order to distinguish whether a parameter's value has changed and if so, the idea is only to look at the DIExpression to distinguish whether to track the modification (e.g. expressions like plus_const, minus_const, etc.) or not, by not analyzing the actual MI that performs the modification that the DBG_VALUE describes. In addition, the problem is in many cases the instruction performing the parameter's modification gets optimized even at IR level, and we have just the DBG_VALUE describing that modification. Consider the test case attached:

...
b = b + 7;
...

the add intrinsic will be removed after the Combine redundant instructions pass and we will have just the dbg.value() at the place describing actual modification, but we want to generate the expression like DW_OP_LLVM_entry_value, 1, DW_OP_plus_uconst, 7, DW_OP_stack_value in LiveDebugValues.

I think that I should split this patch into support for unmodified and modified parameters, and that will be easier to review.

djtodoro updated this revision to Diff 224965.Oct 15 2019, 12:47 AM
djtodoro retitled this revision from WIP: [LiveDebugValues] Introduce entry values of unmodified/modified params to [LiveDebugValues] Introduce entry values of unmodified params.
djtodoro edited the summary of this revision. (Show Details)
djtodoro added reviewers: aprantl, vsk, jmorse, dstenb.
djtodoro removed subscribers: jmorse, dstenb, vsk, aprantl.

-Split this into unmodifed/modified parameters support
-Clean the code a bit
-Rebasing on the top of the latest changes in the pass

djtodoro added a comment.EditedOct 15 2019, 1:17 AM

With the new approach, by building the GDB 7.11 we confirm that there is no performance regression any more (by using the debug entry values experimental option).

-g -O2-g -O2 -Xclang -femit-debug-entry-values (old approach)-g -O2 -Xclang -femit-debug-entry-values (new approach)
build time355 seconds524 seconds358 seconds

By using the llvm-locstats tool, we confirm that the improvement in terms of debug location coverage is still there.

-The stats from gdb compiled with -g -O2:
llvm-locstats -only-formal-parameters gdb

==================================================
              Debug Location Statistics
==================================================
        cov%           samples       percentage(~)
--------------------------------------------------
        0%               5533            7%
        1-9%             2889            4%
        10-19%           3149            4%
        20-29%           2696            3%
        30-39%           2545            3%
        40-49%           2380            3%
        50-59%           2311            3%
        60-69%           2262            3%
        70-79%           2467            3%
        80-89%           3080            4%
        90-99%          10140            14%
        100%            29982            43%
================================================
-the number of debug variables processed: 69434
-PC ranges covered: 64%
================================================

-The stats from gdb with entry values (new approach):
llvm-locstats -only-formal-parameters gdb

==================================================
              Debug Location Statistics
==================================================
        cov%           samples       percentage(~)
--------------------------------------------------
        0%               5527            7%
        1-9%             1854            2%
        10-19%           2161            3%
        20-29%           1731            2%
        30-39%           1642            2%
        40-49%           1556            2%
        50-59%           1546            2%
        60-69%           1596            2%
        70-79%           1805            2%
        80-89%           2466            3%
        90-99%           6437            9%
        100%            41113            59%
================================================
-the number of debug variables processed: 69434
-PC ranges covered: 73%
================================================
djtodoro marked an inline comment as done.Oct 15 2019, 1:23 AM
djtodoro added inline comments.
llvm/lib/CodeGen/LiveDebugValues.cpp
816–817

This is only NFC. This can go as a separate commit, since this piece of code is not used any more.

dstenb added inline comments.Oct 15 2019, 3:05 AM
llvm/lib/CodeGen/LiveDebugValues.cpp
94–99

As a performance optimization, perhaps consider an early exit like this:

if (!Op.isReg())
  return false;

to avoid running the query functions if not needed.

312

seting -> setting

838

Deliting -> Deleting

1239

In order to improve the readability, can you move this into a helper function?

djtodoro updated this revision to Diff 225004.Oct 15 2019, 5:23 AM

@dstenb Thanks a lot for your comments!

-Addressing comments

vsk added a comment.Oct 15 2019, 4:07 PM

Hi @djtodoro, thanks for the patch as always. I'm curious about the DbgEntryValueMovement field. Could you explain it in a little more detail? At a high level, IIUC, it's used to mark copies of entry values for the purpose of simplifying location lists (i.e. keep an original AT_entry_value location description even if there are copies). Is that correct? Does the approach handle there being multiple copies of an entry value within a block?

llvm/lib/CodeGen/LiveDebugValues.cpp
812

Why is the const_cast needed?

935

It would be good to simplify this expression, possibly by sinking it into a helper function that returns a VarLoc.

1224

Why move the DestRegOp->isDef() check to after the isCalleSavedReg lambda definition?

1251

auto &EntryValue?

djtodoro marked 5 inline comments as done.Oct 16 2019, 4:52 AM

@vsk Thanks for the comments! Sure, I will try. :)

The idea from very high level, in this patch, is to generate entry values (when needed) for unmodified parameters (for parameters whose value has not changed throughout a function), by examining the code in at this stage of LLVM pipeline (we removed the front-end part due to performance regression). The main premise here is that the parameter’s value has changed if we encounter a new DBG_VALUE within a block describing the same parameter, and if so we stop tracking the entry value of such parameter. Nevertheless, there are many cases where (due to various optimizations) a parameter value is only moved (copied) around, from one register to another one. We should recognize such situation and keep tracking of entry value of such parameter, since the value actually has not changed. For that scenario, we use the DbgEntryValueMovement field to remember such movement and use it when creating the VarLoc for such entry value. This should handle multiple copies within a block.

llvm/lib/CodeGen/LiveDebugValues.cpp
812

I have use it for downstream code purposes, it was mistake keeping it here... Thanks!

1224

Mistake.

1251

We change the EntryValue with the setDbgEntryValueMovement().

djtodoro updated this revision to Diff 225200.Oct 16 2019, 5:22 AM
djtodoro marked an inline comment as done.

-Addressing comments

Hi,

Djordje wrote:

Nevertheless, there are many cases where (due to various optimizations) a parameter value is only moved (copied) around, from one register to another one. We should recognize such situation and keep tracking of entry value of such parameter, since the value actually has not changed.

A large number of these cases Should (TM) have gone away with D67393, as any transfer DBG_VALUEs created by LiveDebugValues don't enter the instruction stream until after LiveDebugValues completes. In fact, applying the two parent patches and the test from this change to master (which I've done in this tree here [0]), the test already passes for me. (There's a decent chance I mucked something up though).

However, the test case demonstrates something I've found odd in LLVM instruction selection: it creates _two_ DBG_VALUEs for arguments. The code that does it is here [1], dates back to about 2010, and sounds like it tries to propagate variable locations in a way that is now LiveDebugValues domain. If it didn't create an extra DBG_VALUE for every argument, we might not need special handling for extra DBG_VALUEs here. Removing the code in [1] makes very little difference to variable-location coverage statistics, but does trip 12 tests.

I don't have any specific feedback on this patch, but I get the feeling that it might not be necessary if we eliminated [1]; I don't suppose anyone remembers what it's there for?

(Sorry for driving this review in an unrelated direction, but it'd be nice if we could design-out this problem).

[0] https://github.com/jmorse/llvm-project/commits/d68209_experiment
[1] https://github.com/llvm/llvm-project/blob/f24ac13aaae63d92317dac839ce57857a7b444dc/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp#L587

llvm/lib/CodeGen/LiveDebugValues.cpp
1604–1608

Note that this hunk didn't apply cleanly for me; on master this line (1319/1449) reads

!MI.getDebugExpression()->isFragment())
llvm/test/DebugInfo/MIR/X86/entry-value-of-modified-param.mir
20–23

Best to not bake in the metadata numbers directly, and instead to recognise the DILocalVariable records, capture their number, and check for DBG_VALUEs with those captured numbers.

dstenb added inline comments.Oct 16 2019, 3:46 PM
llvm/lib/CodeGen/LiveDebugValues.cpp
1604–1608

That is from D66746. That patch has been postponed a bit as I've been working on other patches, but I'll pick up that again.

djtodoro marked an inline comment as done.Oct 17 2019, 12:34 AM

Hi Jeremy,

Thanks a lot for diving into this!

However, the test case demonstrates something I've found odd in LLVM instruction selection: it creates _two_ DBG_VALUEs for arguments. The code that does it is here [1], dates back to about 2010, and sounds like it tries to propagate variable locations in a way that is now LiveDebugValues domain. If it didn't create an extra DBG_VALUE for every argument, we might not need special handling for extra DBG_VALUEs here. Removing the code in [1] makes very little difference to variable-location coverage statistics, but does trip 12 tests.

Hmm... I see. I agree that that should be the LiveDebugValues job..
Initially when I did the investigation, I have not used the D67393, but I see that it should remove large portion of the cases we care about here.

I don't have any specific feedback on this patch, but I get the feeling that it might not be necessary if we eliminated [1];

Anyhow, I think even we decide to remove the code from [1], we would need one part of this patch, to avoid some unwanted entry values generated. But, the code (and the analysis) will be away more simpler than the current one.

(Sorry for driving this review in an unrelated direction, but it'd be nice if we could design-out this problem).

I really appreciate the comments, since resolving the issue from [1] will simplify this a lot. :)

llvm/lib/CodeGen/LiveDebugValues.cpp
1604–1608

I forgot to add a note that this relies on the D66746 as well. Sorry for that, I will update the diff.

After eyeballing the code in SelectionDAG that generates the additional DBG_VALUEs, it looks like disabling that code will generate a variety of other location regressions. LiveDebugValues doesn't always pick the "best" location when there are multiple options, see for example [0].

With that in mind, this code makes sense. My understanding of what's happening here is that whenever there's a register-copy-transfer, if it's from an entry-value-location then DebugEntryVals map records the transfer destination. Then, if another DBG_VALUE for the same variable is encountered specifying the same destination, it's interpreted as being caused by the register-copy, and so entry values can be propagated over it.

I think what's missing is handling of control flow branching / merging. As far as I can see, DebugEntryVals is a single map, that gets updated whenever a block is processed. If we had a standard diamond control flow like this:

entry:
    setcc %somereg
    jnz block2
block1:
    mov %rax, %rcx
    jmp exit
block2:
    mov %rcx, %rdx
    jmp exit

Then the reverse-post-order order that LiveDebugValues follows will mean it visits blocks in the order {entry, block1, block2, exit} [1]. Isn't there a risk that changes to DebugEntryVals when processing block1, will be visible / will affect the handling of block2?

The rest of LiveDebugValues handles this by having a per-basic-block collection of in-locs / locations and updating them in the 'join' method. That would probably solve this potential issue too, but it add more complexity :(

[0] https://bugs.llvm.org/show_bug.cgi?id=42834
[1] although block1 and block2 might get flipped, I can't remember precisely.

llvm/lib/CodeGen/LiveDebugValues.cpp
317

It's probably worth asserting that you expect the incoming location to be a RegisterKind, should make it clearer for other readers.

473

Can there be fragments of entry values? Typically a <DILocalVariable, InlinedAt, Fragment> tuple is needed to uniquely identify a variable location. I understand inlining doesn't make sense in the context of entry values, but if there can be fragments, then the map key here may need to account for them, now or in the future.

805–806

"entry value if _we_ encounter"

@jmorse Thanks for the comments!

With that in mind, this code makes sense. My understanding of what's happening here is that whenever there's a register-copy-transfer, if it's from an entry-value-location then DebugEntryVals map records the transfer destination. Then, if another DBG_VALUE for the same variable is encountered specifying the same destination, it's interpreted as being caused by the register-copy, and so entry values can be propagated over it.

That is right.

I think what's missing is handling of control flow branching / merging. As far as I can see, DebugEntryVals is a single map, that gets updated whenever a block is processed. If we had a standard diamond control flow like this:
entry:
setcc %somereg
jnz block2
block1:
mov %rax, %rcx
jmp exit
block2:
mov %rcx, %rdx
jmp exit
Then the reverse-post-order order that LiveDebugValues follows will mean it visits blocks in the order {entry, block1, block2, exit} [1]. Isn't there a risk that changes to DebugEntryVals when processing block1, will be visible / will affect the handling of block2?

That could be a problem as well as tracking of clobbering of the register holding the copy of the debug entry value. I work on that as well as on a NFC that should simplify the code a bit.

djtodoro marked an inline comment as done.Wed, Oct 23, 4:12 AM
djtodoro added inline comments.
llvm/lib/CodeGen/LiveDebugValues.cpp
473

At the moment, we do not support fragments, so only the DILocalVariable is enough for now.

djtodoro updated this revision to Diff 227668.Mon, Nov 4, 2:34 AM
djtodoro edited the summary of this revision. (Show Details)

-Make the debug entry values mapping local to the OpenRanges
-At the begging of the analysis record the entry values by making the 'backup' locations
-When needed, the backup location will turn into the primary location of a parameter
-Support diamond structure of bbs
-Add more tests

I was wondering if it would make sense to let LiveDebugValues compute all available locations, and then have DbgEntityHistoryCalculator pick the best one (e.g., prefer anything over an entry value). The downside of that approach is that we'd need to reimplement some of DbgEntityHistoryCalculator in LiveDebugValues for this to work: For example whether DBG_VALUE should truncate a previous DBG_VALUE's range (e.g, if the previous one was constant) or whether it is an alternative location (such as, if a register is copied). I'm not saying that we need to do this now. But it is something we could evolve LiveDebugValues towards.

llvm/lib/CodeGen/LiveDebugValues.cpp
92

Can you adda a /// comment that explains, why this is useful / why a caller might want to know this?

I was wondering if it would make sense to let LiveDebugValues compute all available locations, and then have DbgEntityHistoryCalculator pick the best one (e.g., prefer anything over an entry value). The downside of that approach is that we'd need to reimplement some of DbgEntityHistoryCalculator in LiveDebugValues for this to work: For example whether DBG_VALUE should truncate a previous DBG_VALUE's range (e.g, if the previous one was constant) or whether it is an alternative location (such as, if a register is copied). I'm not saying that we need to do this now. But it is something we could evolve LiveDebugValues towards.

@aprantl +1. I think that should be our next step. That should simplify the code (here) a lot, and besides that, we will have the other benefits such as minimal debug location list indicating smaller .debug_loc section.

djtodoro updated this revision to Diff 227847.Tue, Nov 5, 4:40 AM

-Add the comment for the isRegOtherThanSPAndFP()

jmorse added a comment.Thu, Nov 7, 8:39 AM

This looks good -- some minor nits inline. Keeping the backup-locations in the live-out sets as special records seems like a good solution for avoiding unwanted control flow effects, as shown with the diamond pattern.

Would I be right in thinking that the OpenRangesSet::erase method will now only erase entry value backup locations if you pass in a VarLoc that's also a backup location? This seems important, so that you can call erase and insert when a location _move_ happens, without disturbing the backup value. If this is the case, it should probably go into a comment somewhere documenting why two different sets of Vars / EntryValuesBackupVars are kept, it could be missed otherwise. (If this is documented, I missed the comment, oops).

llvm/lib/CodeGen/LiveDebugValues.cpp
93

"... to avoid basing the debug entry values on the registers..." ?

391

Seeing how VarLocIDs is just a UniqueVector, isn't zero a legitimate index? I'm not aware of the first element being initialized to anything specific.

534–537

Won't having "Vars.empty()" as a clause here mean the assertion fires if the OpenRanges isn't empty, rather than testing whether it's empty?

831–833

If I understand this, if the immediately preceeding instruction is a debug instruction, it's considered to be propagated? This seems unreliable to me -- DBG_VALUEs often appear in packs.

llvm/test/DebugInfo/MIR/X86/entry-values-diamond-bbs.mir
10–15

IMHO more context should be checked for the entry value DBG_VALUEs, i.e. that they come after the correct instruction. The movement of "b" through these blocks is non-trivial, this test should make sure the entry values land in the right place as well as the right block. Same with the test below.

djtodoro marked 5 inline comments as done.Tue, Nov 12, 3:39 AM

@jmorse Thanks for your comments! I tried to address them all!

llvm/lib/CodeGen/LiveDebugValues.cpp
391

I this this comments stands for getEntryValueBackup(), and I have addressed it. Thanks!

534–537

I am not sure any more, since this is not used at the current code..

831–833

Yes, I was not happy with the piece of code. I was gonna put a HACK marker above this, but I think it can be a TODO for now.

llvm/test/DebugInfo/MIR/X86/entry-values-diamond-bbs.mir
10–15

I agree. I have addressed this.

djtodoro updated this revision to Diff 228847.Tue, Nov 12, 3:42 AM

-Update the tests with more checking of the context
-Make more comments of the new functionality
-Address the reviewer's comments

aprantl added inline comments.Tue, Nov 12, 8:59 AM
llvm/lib/CodeGen/LiveDebugValues.cpp
92

nit: /// If \p Op is a ...

djtodoro updated this revision to Diff 229017.Tue, Nov 12, 11:49 PM

-Addressing the latest (nit) comment

djtodoro marked an inline comment as done.Tue, Nov 12, 11:49 PM
djtodoro updated this revision to Diff 229285.Thu, Nov 14, 5:22 AM

-Rebasing on top of the latest LLVM
-Add an additional LLVM_DEBUG() output for debugging purpose within the recordEntryValue()
-Refactor the code a bit
-Add the stack machine function attribute to a test, since it was needed for the new approach for distinguishing the callee saved regs

djtodoro edited the summary of this revision. (Show Details)Thu, Nov 14, 5:22 AM
jmorse accepted this revision.Fri, Nov 15, 6:53 AM

IMO, this LGTM, with the rider that I think the "empty" method needs to be made safer (see inline).

Keeping the main / "backup" locations starts introducing a hierarchy of longevity for locations, which is where I think we want to be heading.

llvm/lib/CodeGen/LiveDebugValues.cpp
391

Reading the comments from UniqueVector.h:

/// insert - Append entry to the vector if it doesn't already exist.  Returns
/// the entry's index + 1 to be used as a unique ID.

So zero really is a special value, excellent.

534–537

I think it should probably read:

assert(Vars.empty() == EmptyValuesBackupVars.empty &&
           Vars.Empty() == VarLocs.empty() &&
           "open ranges are inconsistent");
return VarLocs.empty()

To ensure that if one is empty, all the others are. Simply asserting that Vars is empty will lead to assertions firing for completely valid states of the pass.

This revision is now accepted and ready to land.Fri, Nov 15, 6:53 AM
djtodoro marked an inline comment as done.Mon, Nov 18, 7:01 AM

@jmorse Thanks for your comments!

Keeping the main / "backup" locations starts introducing a hierarchy of longevity for locations, which is where I think we want to be heading.

Thanks. I agree. We may think of refactoring this in future by using some different data structures (e.g. lists, arrays or queue) to store all valid location (primary and backups) for a variable, and try to avoid Backup kinds for every new desirable location that can occur as backup.

llvm/lib/CodeGen/LiveDebugValues.cpp
534–537

It makes sense. Thanks.

djtodoro updated this revision to Diff 229838.Mon, Nov 18, 7:04 AM

-Addressing comments

NOTE: This may be updated with a change in an LLDB test. The issue was found when committing the D68206.