This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/
-
CodeGen/
1/2
LiveDebugValues.cpp
-
test/DebugInfo/MIR/X86/
-
DebugInfo/
-
MIR/
-
X86/
1/1
live-debug-values-bad-transfer.mir

Differential D67500

[DebugInfo] LiveDebugValues: don't create transfer records for potentially invalid locations
Needs ReviewPublic

Authored by jmorse on Sep 12 2019, 7:43 AM.

Download Raw Diff

Details

Reviewers

aprantl
vsk
wolfgangp

Summary

This patch is the counter-part to D67393, and aims to solve the second problem mentioned there, that transfers which are found to be invalid after location propagation occurs should not be inserted. The solution is simple: just don't record transfers until the in-locations are all known to be correct, and then reprocess all instructions a final time creating transfers.

This involves touching every instruction in the program again, which is unfortunate, but an alternate implementation I wrote adds ~150 lines of code to try and unwind chains of transfers, involving a lot of additional complexity. IMO, this is a reasonable trade-off. As a bonus though, because we don't generate a transfer every time we look at a block, we don't generate redundant identical DBG_VALUEs if we re-visit a block.

There's a fractional reduction in scope-bytes-covered with this patch (<0.1%), all which must have been invalid.

(I'll have some asan-build-time measurements of this in a couple of hours).

Diff Detail

Event Timeline

jmorse created this revision.Sep 12 2019, 7:43 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 12 2019, 7:43 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

jmorse added a parent revision: D67398: [DebugInfo] LiveDebugValues: Move DBG_VALUE creation into VarLoc class.Sep 12 2019, 7:43 AM

jmorse marked an inline comment as done.

jmorse added inline comments.

test/DebugInfo/MIR/X86/live-debug-values-bad-transfer.mir
17–19	I forgot to say in the summary: the absence of this incorrect DBG_VALUE is checked by the --implicit-check-not given to FileCheck.

just don't record transfers until the in-locations are all known to be correct

I'm having trouble with this statement conceptually. Aren't in-locations coming in through join() always correct, just that we didn't reach a fixpoint yet? I'm assuming that at the beginning all in-locations are unknown and only when all incoming basic blocks have outgoing locations, join() adds new incoming locations to be processed. I guess this boils down to how we are not really modeling "unknown" currently?

lib/CodeGen/LiveDebugValues.cpp
488	When is Transfers null?

In D67500#1668097, @aprantl wrote:

I'm assuming that at the beginning all in-locations are unknown and only when all incoming basic blocks have outgoing locations, join() adds new incoming locations to be processed.

Ah, this is where the implementation diverges, in fact if all incoming blocks have the same outgoing location *or* unknown for a variable, join() will consider it a valid live-in location. I think it has to be this way because of loops -- for a location that is live throughout a loop, there's a circular dependency between the outgoing location for the back-edge and the live-in location for the loop header. Right now we assume the location is correct for any unknown blocks, and have to invalidate locations later if that doesn't turn out to be true. See for example D66599.

I guess this boils down to how we are not really modeling "unknown" currently?

It turns out to be even more exciting than that -- consider this IR [0] which has badly behaved, but completely legal control flow. It's illustrated in this highly sophisticated diagram [1]. A single variable location (toast==0) is created in the entry block, and then invalidated with an undef dbg.value in loop 3. Because every non-entry/non-exit block is reachable from every other block, the location-invalidation has to be propagated to them all. However, it takes at least one LiveDebugValues iteration for the invalidation to traverse a back-edge, so it takes two LiveDebugvalues iterations for the invalidation to move up the CFG and reach loop1. In that time, all the "unknown" blocks have been processed, and as a result locations have to transition from "true" to "false", rather than "unknown" to "false". It might be possible to generate a situation where something transitions from "false" to "true" too, but I haven't looked closely at that yet.

I think the fact that this is happening means I've made LiveDebugValues depart from peoples expectations, you wrote in the 2016 mailing list dicsussion [2]

the information at the nodes can only move in one direction in the lattice and can change at most once.

which isn't true any more because of D66599 removing locations. I suspect I've turned LiveDebugValues into a data-flow-like algorithm rather than a joining-lattices algorithm. Whether this is a good idea or not probably needs discussing; it's all been in aid of getting rid of invalid locations though.

[0] https://reviews.llvm.org/P8163
[1] https://imgur.com/a/YdE9kN7
[2] https://groups.google.com/forum/#!search/llvm-dev/llvm-dev/nlgxIfDQnhs/4zsGbuoTAAAJ

jmorse marked an inline comment as done.Sep 13 2019, 7:43 AM

jmorse added inline comments.

lib/CodeGen/LiveDebugValues.cpp
488	I've made the main loop (line 1392 on this patch) pass null to signify it doesn't want transfers recording. A few revisions ago LiveDebugValues was passing enum values "transferChanges", "dontTransferChanges" to indicate whether transfers should be examined, we could do something similar here to make it clearer?

In D67500#1669456, @jmorse wrote:

In D67500#1668097, @aprantl wrote:

I'm assuming that at the beginning all in-locations are unknown and only when all incoming basic blocks have outgoing locations, join() adds new incoming locations to be processed.

Ah, this is where the implementation diverges, in fact if all incoming blocks have the same outgoing location *or* unknown for a variable, join() will consider it a valid live-in location. I think it has to be this way because of loops -- for a location that is live throughout a loop, there's a circular dependency between the outgoing location for the back-edge and the live-in location for the loop header. Right now we assume the location is correct for any unknown blocks, and have to invalidate locations later if that doesn't turn out to be true. See for example D66599.

I guess this boils down to how we are not really modeling "unknown" currently?

It turns out to be even more exciting than that -- consider this IR [0] which has badly behaved, but completely legal control flow. It's illustrated in this highly sophisticated diagram [1]. A single variable location (toast==0) is created in the entry block, and then invalidated with an undef dbg.value in loop 3. Because every non-entry/non-exit block is reachable from every other block, the location-invalidation has to be propagated to them all. However, it takes at least one LiveDebugValues iteration for the invalidation to traverse a back-edge, so it takes two LiveDebugvalues iterations for the invalidation to move up the CFG and reach loop1. In that time, all the "unknown" blocks have been processed, and as a result locations have to transition from "true" to "false", rather than "unknown" to "false". It might be possible to generate a situation where something transitions from "false" to "true" too, but I haven't looked closely at that yet.

I think the fact that this is happening means I've made LiveDebugValues depart from peoples expectations, you wrote in the 2016 mailing list dicsussion [2]

the information at the nodes can only move in one direction in the lattice and can change at most once.

which isn't true any more because of D66599 removing locations. I suspect I've turned LiveDebugValues into a data-flow-like algorithm rather than a joining-lattices algorithm. Whether this is a good idea or not probably needs discussing; it's all been in aid of getting rid of invalid locations though.

[0] https://reviews.llvm.org/P8163
[1] https://imgur.com/a/YdE9kN7
[2] https://groups.google.com/forum/#!search/llvm-dev/llvm-dev/nlgxIfDQnhs/4zsGbuoTAAAJ

Thanks for the thoughtful explanation. I don't think this is good. I re-read D66599 and it sounds like it is also operating on the pre-D66663 premise that the first iteration is special to prime that values. That's really bad because it makes reasoning about the algorithm very hard (as evidenced by our conversation ;-) and also turns the algorithm from a bitvector problem to something much more expensive all the way to the point that I'm not even sure if it will always terminate. I think the reason why D66599 needs to remove values is because we implemented join() and the unknown value incorrectly.

Is there a fundamental reason why we couldn't model LiveDebugValues to monotonically move results up the lattice until a fixpoint is reached? Looking at your CFG in [1], if we (conceptually) initialized every VarLoc with unknown and implement join such that join(unknown, X) == unknown, we should get an initial set of out-locs for each basic block that is derived just from the info we have for sure *within* each basic block and gradually more info gets propagated until we may or may not have a join function that produces new in-locs for a basic block. My assumption is that new in-locs would never result in an out-loc flipping from true->false but only in one flipping from unknown->something.

Maybe this assumption is wrong, but if it isn't, we should try hard to get LiveDebugValues to behave that way.

Adrian wrote:

I don't think this is good.

Cool;

Is there a fundamental reason why we couldn't model LiveDebugValues to monotonically move results up the lattice until a fixpoint is reached? Looking at your CFG in [1], if we (conceptually) initialized every VarLoc with unknown and implement join such that join(unknown, X) == unknown, we should get an initial set of out-locs for each basic block that is derived just from the info we have for sure *within* each basic block and gradually more info gets propagated until we may or may not have a join function that produces new in-locs for a basic block. My assumption is that new in-locs would never result in an out-loc flipping from true->false but only in one flipping from unknown->something.

I think that idea works -- but it seems very close to the "Available Expressions" dataflow algorithm, which (I think) you're suggesting is too expensive, or at least undesirable right? Just to justify why I think they're similar: my understanding of available-expressions is that it initialises all possible expressions (here VarLocs) to true, then iteratively sets them to false as operands are killed (here clobbered), reaching a fixedpoint of expression availability (here variable locations) from their operand definitions (DBG_VALUEs or transfers) to kills (clobbers).

I think we would have difficulty with the fact that, even if we only implicitly made all locations "unknown" to begin with, we would still end up tracking a large number of "false" locations. Imagine the example in [0,1] were written in MIR, and that the entry variable-location were a register, which was clobbered in loop3 rather than being explicitly undef'd. There would be (AFAIUI) no way of telling that the register clobber in loop3 invalidated a variable location we cared about; we would have to track all register clobbers as "false" locations on the off-chance they clobbered a real variable location. (This would work though, be well defined, and will definitely complete).

I think the root cause is the circular dependency of locations in loops that I suggested above; we have to either

Speculatively consider a location true (until it's invalidated), which is what I've wound up doing, or
Collect the total set of invalidated locations

to break that loop.

I pointed @Orlando at this to see what he thought, he suggested that I've effectively downgraded "true" locations into being "explicitly unknown" or "maybe" locations that can later be found to be false; and that a true location is just a "maybe" location that never gets invalidated. I'm going to take a shot at proving that the current situation can't turn "false" locations true, which would make me feel better.

Alas, I'm also out of the office for the next two weeks so can't reply much further. I'll be at the conference though, in case this is best left on the backburner until then.

jmorse mentioned this in D68209: [LiveDebugValues] Introduce entry values of unmodified params.Oct 3 2019, 3:07 AM

NB: I haven't forgotten about this, in particular the concern that the implementation on master right now might not complete. This is next on my list of things to mangle.

(I don't think the branch date for llvm-10 has been announced, but it's probably next month).

Hi,

I think I can narrow the conceptual difference between our interpretations down to this:

we should get an initial set of out-locs for each basic block that is derived
just from the info we have for sure *within* each basic block and gradually
more info gets propagated until we may or may not have a join function that
produces new in-locs for a basic block

Plus, you're saying "up" the lattice and I'm saying "down".

Reading through "Compilers, Principles, Techniques and Tools" (Aho, Lam, Sethi, Ullman) [0], there are actually two flavours of dataflow analyses (section 9.2):

Those where all lattice values are initialized to the bottom element of the lattice, then are incrementally moved up the lattice as more truths about the program become available.
Those where all lattice values are initialised to the _top_ element of the lattice, and are incrementally moved _down_ the lattice as more information about false-ness in the program becomes available.

Effectively the former assumes nothing and tries to build facts up from the program blocks, wheras the latter assumes everything is true and then removes those facts that can be proved false. I've been exclusively thinking in terms of the second one, and I think your terminology indicates you see LiveDebugValues as the first. (It doesn't help that I've continued to use the term "join", when I think (90%) in the second model it's actually "meet". I can never remember which way round they are).

This would explain the confusion over this patch and D66599: in the first kind of dataflow analysis, during every iteration of the analysis, live-in and live-out sets only take on a lattice element if it's definitely true. Wheras in the second kind, lattice elements aren't necessarily correct until the fixedpoint is reached, because they might yet reduce down the lattice. That's the essence of this patch (D67500), not trying to determine transfers until after the fixedpoint is reached.

Furthermore, it explains why I see a need for multiple transitions in lattice elements. In the original lattice (putting \bot at the bottom):

True     False
  \       / 
   \     /
    \   /
     \ /
     \bot

You would only ever move up the lattice to True or False if it was definitely true -- and if it was definitely true, you would only ever transition once. Wheras I think the lattice I'm effectively working with is:

\top
  |
True
  |
False

Where all values are initialized to \top, the "unknown" element (when a block is unvisited). Values decay to True when the block is first visited and the location _could_ be true, and then potentially decay to False if that assumption is later found to be invalid. (Coincidentally, False is also \bot here).

Which dataflow model is better then? The Aho/Lam/Sethi/Ullman book says the first computes a minimal solution, the second a maximal, and for variable locations I suspect we want a maximal solution. But in addition, the Cooper/Torczon "Engineering a Compiler" book (Section 9.3.6) states the second kind is an optimistic algorithm suited for dealing with loops. If you have a loop like this:

entry:
  dbg.value(%0, ...)
  br label %loop
loop:
  call void @nop()
  br i1 %cond, label %loop, label %exit
exit:
  ret void

Then the live-in lattice element of the loop block has a circular dependency with itself due to the loop backedge: it will be True if it is already True. With the second kind of dataflow analysis, we assume the live-in is True and that the backedge will confirm this later: or if the location is False across the backedge then the live-in will later reduce to False. I don't think the first kind of analysis can cope with this case, as you have to start with some kind of assumption or speculation that the live-in is True to later find out whether the backedge live-out is True. Hence, the first kind of analysis produces a minimal solution, the second maximal. This is also the problem that kicked off the 2016 discussion about missing locations [1], locations not being propagated into loops.

In summary:

I think we've been talking about two different flavours of dataflow analyses, happily I think I've found the words to describe the difference,
I believe we want to use an optimistic dataflow analysis (the second kind above) because it's necessary to propagate locations through loops,
This second kind of dataflow analysis requires a lattice transition from True to False because it works by starting with assuming everything is True [2], then removing anything that can be proved to be False.

(I speculated in another comment that I might have written some code that performs False -> True transitions; happily that turns out to not be correct).

[0] Which I'm led to believe is the dragon book, although the international edition doesn't have a dragon on the cover :/
[1] https://groups.google.com/d/msg/llvm-dev/nlgxIfDQnhs/c55kcncIAAAJ
[2] Technically I think the Unknown lattice element in my model is un-necessary, it's a hack to let us avoid initialising every variable location ahead of time. Didn't want to get bogged down with it in this analysis though.

Ping (summary of situation in previous comment).

Thanks for the writeup, this is very helpful. I am indeed thinking of LiveDebugValues as the first kind of analysis.

One thing that we must define, too, is what true and false mean in the context of LiveDebugValues. My understanding is that for each DBG_VALUE d in the analyzed MIR function and each instruction i

true means: "it is correct to insert d at position i"
false means: "it would be wrong to insert d at position i"

In other words at each instruction i the set of DBG_VALUEs that are "true" are the set of "live" DBG_VALUEs.

As for your loop example: *My* mental model actually used a three-element lattice that is suspiciously close to your second one, but using the opposite direction:

\top (wrong to place DBG_VALUE = false in the above sense / DBG_VALUE is not in the set)
   |
true (DBG_VALUE may be placed / DBG_VALUE is in the set)
   |
\bot (not fully analyzed yet)

We'd initialize everything with \bot.
join is a commutative function that is
join(x, x) = x
join(x, \bot) = x
join(x, \top) = \top

After processing the first basic block once, we get:

entry: ; {\bot}
  dbg.value(%0, ...) ; {true}
  br label %loop
loop: ; {\bot}
  call void @nop() ; {\bot}
  br i1 %cond, label %loop, label %exit ; {\bot}
exit: ; {\bot}
  ret void ; {\bot}

processing the second block once, we get

entry: ; {\bot}
  dbg.value(%0, ...) ; {true}
  br label %loop
loop: ; {join(true (entry), \bot (backedge)) = true}
  call void @nop() ; {true}
  br i1 %cond, label %loop, label %exit ; {true}
exit: ; {\bot}
  ret void ; {\bot}

processing the second block twice, we get

loop: ; {join(true (entry), true (backedge)) = true}

and then true is propagated to the final block. Every time a value changes all successor blocks are added to the worklist. The join function ensures that values can only move in one direction, from \bot -> true -> \top.

I *think* I just described something isomorphic to your type (2), just using other names.

aprantl added a subscriber: davide.Feb 4 2020, 1:12 PM

Adrian wrote:

My understanding is that for each DBG_VALUE d in the analyzed MIR function and each instruction i

true means: "it is correct to insert d at position i"

false means: "it would be wrong to insert d at position i"

In other words at each instruction i the set of DBG_VALUEs that are "true" are the set of "live" DBG_VALUEs.

I agree,

[...]

I *think* I just described something isomorphic to your type (2), just using other names.

Yup, sounds like we're in violent agreement then.

There are now only (!) two big differences between this model and what the LiveDebugValues code actually does. I think they're sound, and will explain below. If it sounds sound, I can bake this into file documentation for everyones future reference.

1, The implicit "Unknown" value: We don't actually represent your \bot at all, instead a location in a block is implicitly unknown until the block is explored. Practically, this means that the result of a join(...) can only ever be your true or \top. Given that we always explore in reverse-post-order, there's always an explored path back to the entry block where the value must begin \top. That means there's always a predecessor that has a value \top or true: so we can never produce a \bot from join anyway.

2, We don't "know" in advance about locations that are discovered during location propagation. For example, when we have in a block, after some control flow:

DBG_VALUE $rax, ...
$rbx = COPY $rax

And no other variable location was in $rbx, then during exploration we create a new variable location of "$rbx" without exploring its behaviour from the entry block. Implicitly, all predecessor blocks get initialized to \top value for that variable location. I think this is /OK/ but messy. Given that the variable location was not seen when exploring predecessor blocks, they would have had \top propagated into them anyway from the entry block.

Successor blocks could potentially be initialized to \top too if the new location is discovered after the first dataflow iteration, and that could artifically limit the range of the variable location. However, I think in practice that these locations are _always_ found on the first iteration: we only ever move "up" your lattice and have no unknown values after a block is explored; therefore the maximum number of locations will be "true" on the first iteration; and so any new location that could be found _must_ be found on the first iteration.

In D67500#1861979, @jmorse wrote:

Adrian wrote:

My understanding is that for each DBG_VALUE d in the analyzed MIR function and each instruction i

true means: "it is correct to insert d at position i"

false means: "it would be wrong to insert d at position i"

In other words at each instruction i the set of DBG_VALUEs that are "true" are the set of "live" DBG_VALUEs.

I agree,

[...]

I *think* I just described something isomorphic to your type (2), just using other names.

Yup, sounds like we're in violent agreement then.

There are now only (!) two big differences between this model and what the LiveDebugValues code actually does. I think they're sound, and will explain below. If it sounds sound, I can bake this into file documentation for everyones future reference.

That would be great, and if any of our functions can benefit from renaming to fit established nomenclature, we should do that, too.

1, The implicit "Unknown" value: We don't actually represent your \bot at all, instead a location in a block is implicitly unknown until the block is explored.

It's not represented in the *data* at all since we only have a set, it just falls out of the algorithm. Can you remind me why this works again? Is it because we visit each block locally once before calling any join() function?

Practically, this means that the result of a join(...) can only ever be your true or \top. Given that we always explore in reverse-post-order, there's always an explored path back to the entry block where the value must begin \top. That means there's always a predecessor that has a value \top or true: so we can never produce a \bot from join anyway.

2, We don't "know" in advance about locations that are discovered during location propagation. For example, when we have in a block, after some control flow:
DBG_VALUE $rax, ...
$rbx = COPY $rax
And no other variable location was in $rbx, then during exploration we create a new variable location of "$rbx" without exploring its behaviour from the entry block. Implicitly, all predecessor blocks get initialized to \top value for that variable location. I think this is /OK/ but messy. Given that the variable location was not seen when exploring predecessor blocks, they would have had \top propagated into them anyway from the entry block.

How is this different from discovering a DBG_VALUE the first time?

Successor blocks could potentially be initialized to \top too if the new location is discovered after the first dataflow iteration, and that could artifically limit the range of the variable location. However, I think in practice that these locations are _always_ found on the first iteration: we only ever move "up" your lattice and have no unknown values after a block is explored; therefore the maximum number of locations will be "true" on the first iteration; and so any new location that could be found _must_ be found on the first iteration.

It's not represented in the *data* at all since we only have a set, it just falls out of the algorithm. Can you remind me why this works again? Is it because we visit each block locally once before calling any join() function?

I think it's because at least one predecessor is guaranteed to have been visited first (except for the entry block), which ensures there is at least one non-unknown lattice value join(...) to return. Inductively,

The entry block always starts with \top,
join(...) can only return Unknown (\bot) if all inputs are Unknown

And because we only visit a block when its predecessors have been visited, join() will never see a block where all its predecessors have Unknown values.

This might fall apart if the entry block is also a loop head, but I think the entry block always starting with \top might fix that.

[Locations being discovered during exploration]

How is this different from discovering a DBG_VALUE the first time?

I guess it isn't different really. For *any* variable location we might only start tracking it when it's discovered during exploration, instead of tracking it from the entry block. To reform the question: do we still correctly implement the dataflow analysis if we begin tracking variable locations after exploration starts, instead of tracking from the entry block? As explained above, I think we get correct values for previously explored blocks (i.e., \top), and I think we're not artificially limiting "new" locations to \top in later blocks because they should all be found on the first iteration.

jmorse mentioned this in D80929: [NFC][LiveDebugvalues] Document how LiveDebugValues operates.Jun 1 2020, 8:12 AM

jmorse mentioned this in D80868: [LiveDebugValues] Remove PendingInLocs (NFCI).Jun 1 2020, 8:23 AM

jmorse mentioned this in D106856: [LiveDebugValues] Cleanup Transfers when removing Entry Value Location.Jul 27 2021, 4:37 AM

Revision Contents

Path

Size

lib/

CodeGen/

LiveDebugValues.cpp

80 lines

test/

DebugInfo/

MIR/

X86/

live-debug-values-bad-transfer.mir

7 lines

Diff 219915

lib/CodeGen/LiveDebugValues.cpp

Show First 20 Lines • Show All 458 Lines • ▼ Show 20 Lines	#endif
/// and set \p Reg to the spilled register.		/// and set \p Reg to the spilled register.
Optional<VarLoc::SpillLoc> isRestoreInstruction(const MachineInstr &MI,		Optional<VarLoc::SpillLoc> isRestoreInstruction(const MachineInstr &MI,
MachineFunction *MF,		MachineFunction *MF,
unsigned &Reg);		unsigned &Reg);
/// Given a spill instruction, extract the register and offset used to		/// Given a spill instruction, extract the register and offset used to
/// address the spill location in a target independent way.		/// address the spill location in a target independent way.
VarLoc::SpillLoc extractSpillBaseRegAndOffset(const MachineInstr &MI);		VarLoc::SpillLoc extractSpillBaseRegAndOffset(const MachineInstr &MI);
void insertTransferDebugPair(MachineInstr &MI, OpenRangesSet &OpenRanges,		void insertTransferDebugPair(MachineInstr &MI, OpenRangesSet &OpenRanges,
TransferMap &Transfers, VarLocMap &VarLocIDs,		TransferMap *Transfers, VarLocMap &VarLocIDs,
unsigned OldVarID, TransferKind Kind,		unsigned OldVarID, TransferKind Kind,
unsigned NewReg = 0);		unsigned NewReg = 0);

void transferDebugValue(const MachineInstr &MI, OpenRangesSet &OpenRanges,		void transferDebugValue(const MachineInstr &MI, OpenRangesSet &OpenRanges,
VarLocMap &VarLocIDs);		VarLocMap &VarLocIDs);
void transferSpillOrRestoreInst(MachineInstr &MI, OpenRangesSet &OpenRanges,		void transferSpillOrRestoreInst(MachineInstr &MI, OpenRangesSet &OpenRanges,
VarLocMap &VarLocIDs, TransferMap &Transfers);		VarLocMap &VarLocIDs, TransferMap *Transfers);
void emitEntryValues(MachineInstr &MI, OpenRangesSet &OpenRanges,		void emitEntryValues(MachineInstr &MI, OpenRangesSet &OpenRanges,
VarLocMap &VarLocIDs, TransferMap &Transfers,		VarLocMap &VarLocIDs, TransferMap *Transfers,
DebugParamMap &DebugEntryVals,		DebugParamMap &DebugEntryVals,
SparseBitVector<> &KillSet);		SparseBitVector<> &KillSet);
void transferRegisterCopy(MachineInstr &MI, OpenRangesSet &OpenRanges,		void transferRegisterCopy(MachineInstr &MI, OpenRangesSet &OpenRanges,
VarLocMap &VarLocIDs, TransferMap &Transfers);		VarLocMap &VarLocIDs, TransferMap *Transfers);
void transferRegisterDef(MachineInstr &MI, OpenRangesSet &OpenRanges,		void transferRegisterDef(MachineInstr &MI, OpenRangesSet &OpenRanges,
VarLocMap &VarLocIDs, TransferMap &Transfers,		VarLocMap &VarLocIDs, TransferMap *Transfers,
DebugParamMap &DebugEntryVals);		DebugParamMap &DebugEntryVals);
bool transferTerminator(MachineBasicBlock *MBB, OpenRangesSet &OpenRanges,		bool transferTerminator(MachineBasicBlock *MBB, OpenRangesSet &OpenRanges,
VarLocInMBB &OutLocs, const VarLocMap &VarLocIDs);		VarLocInMBB &OutLocs, const VarLocMap &VarLocIDs);

void process(MachineInstr &MI, OpenRangesSet &OpenRanges,		void process(MachineInstr &MI, OpenRangesSet &OpenRanges,
VarLocInMBB &OutLocs, VarLocMap &VarLocIDs,		VarLocMap &VarLocIDs, TransferMap *Transfers,
		aprantlUnsubmitted Not Done Reply Inline Actions When is Transfers null? aprantl: When is Transfers null?
		jmorseAuthorUnsubmitted Done Reply Inline Actions I've made the main loop (line 1392 on this patch) pass null to signify it doesn't want transfers recording. A few revisions ago LiveDebugValues was passing enum values "transferChanges", "dontTransferChanges" to indicate whether transfers should be examined, we could do something similar here to make it clearer? jmorse: I've made the main loop (line 1392 on this patch) pass null to signify it doesn't want…
TransferMap &Transfers, DebugParamMap &DebugEntryVals,		DebugParamMap &DebugEntryVals, OverlapMap &OverlapFragments,
OverlapMap &OverlapFragments,
VarToFragments &SeenFragments);		VarToFragments &SeenFragments);

void accumulateFragmentMap(MachineInstr &MI, VarToFragments &SeenFragments,		void accumulateFragmentMap(MachineInstr &MI, VarToFragments &SeenFragments,
OverlapMap &OLapMap);		OverlapMap &OLapMap);

bool join(MachineBasicBlock &MBB, VarLocInMBB &OutLocs, VarLocInMBB &InLocs,		bool join(MachineBasicBlock &MBB, VarLocInMBB &OutLocs, VarLocInMBB &InLocs,
const VarLocMap &VarLocIDs,		const VarLocMap &VarLocIDs,
SmallPtrSet<const MachineBasicBlock *, 16> &Visited,		SmallPtrSet<const MachineBasicBlock *, 16> &Visited,
▲ Show 20 Lines • Show All 199 Lines • ▼ Show 20 Lines	if (isDbgValueDescribedByReg(MI) \|\| MI.getOperand(0).isImm() \|\|
assert(MI.getOperand(0).isReg() && MI.getOperand(0).getReg() == 0 &&		assert(MI.getOperand(0).isReg() && MI.getOperand(0).getReg() == 0 &&
"Unexpected non-undef DBG_VALUE encountered");		"Unexpected non-undef DBG_VALUE encountered");
}		}
}		}

void LiveDebugValues::emitEntryValues(MachineInstr &MI,		void LiveDebugValues::emitEntryValues(MachineInstr &MI,
OpenRangesSet &OpenRanges,		OpenRangesSet &OpenRanges,
VarLocMap &VarLocIDs,		VarLocMap &VarLocIDs,
TransferMap &Transfers,		TransferMap *Transfers,
DebugParamMap &DebugEntryVals,		DebugParamMap &DebugEntryVals,
SparseBitVector<> &KillSet) {		SparseBitVector<> &KillSet) {
for (unsigned ID : KillSet) {		for (unsigned ID : KillSet) {
if (!VarLocIDs[ID].Var.getVar()->isParameter())		if (!VarLocIDs[ID].Var.getVar()->isParameter())
continue;		continue;

const MachineInstr *CurrDebugInstr = &VarLocIDs[ID].MI;		const MachineInstr *CurrDebugInstr = &VarLocIDs[ID].MI;

// If parameter's DBG_VALUE is not in the map that means we can't		// If parameter's DBG_VALUE is not in the map that means we can't
// generate parameter's entry value.		// generate parameter's entry value.
if (!DebugEntryVals.count(CurrDebugInstr->getDebugVariable()))		if (!DebugEntryVals.count(CurrDebugInstr->getDebugVariable()))
continue;		continue;

auto ParamDebugInstr = DebugEntryVals[CurrDebugInstr->getDebugVariable()];		auto ParamDebugInstr = DebugEntryVals[CurrDebugInstr->getDebugVariable()];
DIExpression *NewExpr = DIExpression::prepend(		DIExpression *NewExpr = DIExpression::prepend(
ParamDebugInstr->getDebugExpression(), DIExpression::EntryValue);		ParamDebugInstr->getDebugExpression(), DIExpression::EntryValue);

VarLoc EntryLoc = VarLoc::CreateEntryLoc(*ParamDebugInstr, LS, NewExpr);		VarLoc EntryLoc = VarLoc::CreateEntryLoc(*ParamDebugInstr, LS, NewExpr);

unsigned EntryValLocID = VarLocIDs.insert(EntryLoc);		unsigned EntryValLocID = VarLocIDs.insert(EntryLoc);
Transfers.push_back({&MI, EntryValLocID});
OpenRanges.insert(EntryValLocID, EntryLoc.Var);		OpenRanges.insert(EntryValLocID, EntryLoc.Var);

		if (Transfers)
		Transfers->push_back({&MI, EntryValLocID});
}		}
}		}

/// Create new TransferDebugPair and insert it in \p Transfers. The VarLoc		/// Implement a transfer -- the VarLoc with \p OldVarID should be deleted from
/// with \p OldVarID should be deleted form \p OpenRanges and replaced with		/// \p OpenRanges and replaced with a new VarLoc. If \p NewReg is different
/// new VarLoc. If \p NewReg is different than default zero value then the		/// than default zero value then the new location will be register location
/// new location will be register location created by the copy like instruction,		/// created by the copy like instruction, otherwise it is variable's location
/// otherwise it is variable's location on the stack.		/// on the stack. If \p Transfers is non-null, record the transfers location.
void LiveDebugValues::insertTransferDebugPair(		void LiveDebugValues::insertTransferDebugPair(
MachineInstr &MI, OpenRangesSet &OpenRanges, TransferMap &Transfers,		MachineInstr &MI, OpenRangesSet &OpenRanges, TransferMap *Transfers,
VarLocMap &VarLocIDs, unsigned OldVarID, TransferKind Kind,		VarLocMap &VarLocIDs, unsigned OldVarID, TransferKind Kind,
unsigned NewReg) {		unsigned NewReg) {
const MachineInstr *DebugInstr = &VarLocIDs[OldVarID].MI;		const MachineInstr *DebugInstr = &VarLocIDs[OldVarID].MI;

auto ProcessVarLoc = [&MI, &OpenRanges, &Transfers, &DebugInstr,		auto ProcessVarLoc = [&MI, &OpenRanges, &Transfers, &DebugInstr,
&VarLocIDs](VarLoc &VL) {		&VarLocIDs](VarLoc &VL) {
unsigned LocId = VarLocIDs.insert(VL);		unsigned LocId = VarLocIDs.insert(VL);

// Close this variable's previous location range.		// Close this variable's previous location range.
DebugVariable V(*DebugInstr);		DebugVariable V(*DebugInstr);
OpenRanges.erase(V);		OpenRanges.erase(V);

// Record the new location as an open range, and a postponed transfer		// Record the new location as an open range
// inserting a DBG_VALUE for this location.
OpenRanges.insert(LocId, VL.Var);		OpenRanges.insert(LocId, VL.Var);

		// If requested, record the transfer between locations.
		if (Transfers) {
TransferDebugPair MIP = {&MI, LocId};		TransferDebugPair MIP = {&MI, LocId};
Transfers.push_back(MIP);		Transfers->push_back(MIP);
		}
};		};

// End all previous ranges of Var.		// End all previous ranges of Var.
OpenRanges.erase(VarLocIDs[OldVarID].Var);		OpenRanges.erase(VarLocIDs[OldVarID].Var);
switch (Kind) {		switch (Kind) {
case TransferKind::TransferCopy: {		case TransferKind::TransferCopy: {
assert(NewReg &&		assert(NewReg &&
"No register supplied when handling a copy of a debug value");		"No register supplied when handling a copy of a debug value");
Show All 37 Lines	void LiveDebugValues::insertTransferDebugPair(
}		}
}		}
llvm_unreachable("Invalid transfer kind");		llvm_unreachable("Invalid transfer kind");
}		}

/// A definition of a register may mark the end of a range.		/// A definition of a register may mark the end of a range.
void LiveDebugValues::transferRegisterDef(		void LiveDebugValues::transferRegisterDef(
MachineInstr &MI, OpenRangesSet &OpenRanges, VarLocMap &VarLocIDs,		MachineInstr &MI, OpenRangesSet &OpenRanges, VarLocMap &VarLocIDs,
TransferMap &Transfers, DebugParamMap &DebugEntryVals) {		TransferMap *Transfers, DebugParamMap &DebugEntryVals) {
MachineFunction *MF = MI.getMF();		MachineFunction *MF = MI.getMF();
const TargetLowering *TLI = MF->getSubtarget().getTargetLowering();		const TargetLowering *TLI = MF->getSubtarget().getTargetLowering();
unsigned SP = TLI->getStackPointerRegisterToSaveRestore();		unsigned SP = TLI->getStackPointerRegisterToSaveRestore();
SparseBitVector<> KillSet;		SparseBitVector<> KillSet;
for (const MachineOperand &MO : MI.operands()) {		for (const MachineOperand &MO : MI.operands()) {
// Determine whether the operand is a register def. Assume that call		// Determine whether the operand is a register def. Assume that call
// instructions never clobber SP, because some backends (e.g., AArch64)		// instructions never clobber SP, because some backends (e.g., AArch64)
// never list SP in the regmask.		// never list SP in the regmask.
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	if (MI.getRestoreSize(TII)) {
return extractSpillBaseRegAndOffset(MI);		return extractSpillBaseRegAndOffset(MI);
}		}
return None;		return None;
}		}

/// A spilled register may indicate that we have to end the current range of		/// A spilled register may indicate that we have to end the current range of
/// a variable and create a new one for the spill location.		/// a variable and create a new one for the spill location.
/// A restored register may indicate the reverse situation.		/// A restored register may indicate the reverse situation.
/// We don't want to insert any instructions in process(), so we just create		/// Any change in location will be recorded in \p OpenRanges, and \p Transfers
/// the DBG_VALUE without inserting it and keep track of it in \p Transfers.		/// if it is non-null.
/// It will be inserted into the BB when we're done iterating over the
/// instructions.
void LiveDebugValues::transferSpillOrRestoreInst(MachineInstr &MI,		void LiveDebugValues::transferSpillOrRestoreInst(MachineInstr &MI,
OpenRangesSet &OpenRanges,		OpenRangesSet &OpenRanges,
VarLocMap &VarLocIDs,		VarLocMap &VarLocIDs,
TransferMap &Transfers) {		TransferMap *Transfers) {
MachineFunction *MF = MI.getMF();		MachineFunction *MF = MI.getMF();
TransferKind TKind;		TransferKind TKind;
unsigned Reg;		unsigned Reg;
Optional<VarLoc::SpillLoc> Loc;		Optional<VarLoc::SpillLoc> Loc;

LLVM_DEBUG(dbgs() << "Examining instruction: "; MI.dump(););		LLVM_DEBUG(dbgs() << "Examining instruction: "; MI.dump(););

// First, if there are any DBG_VALUEs pointing at a spill slot that is		// First, if there are any DBG_VALUEs pointing at a spill slot that is
Show All 13 Lines	for (unsigned ID : OpenRanges.getVarLocs()) {
// memory, and then analysing all memory writes to see if they		// memory, and then analysing all memory writes to see if they
// overwrite that memory, which is expensive.		// overwrite that memory, which is expensive.
//		//
// At this stage, we already know which DBG_VALUEs are for spills and		// At this stage, we already know which DBG_VALUEs are for spills and
// where they are located; it's best to fix handle overwrites now.		// where they are located; it's best to fix handle overwrites now.
KillSet.set(ID);		KillSet.set(ID);
VarLoc UndefVL = VarLoc::CreateCopyLoc(VL.MI, LS, 0);		VarLoc UndefVL = VarLoc::CreateCopyLoc(VL.MI, LS, 0);
unsigned UndefLocID = VarLocIDs.insert(UndefVL);		unsigned UndefLocID = VarLocIDs.insert(UndefVL);
Transfers.push_back({&MI, UndefLocID});
		if (Transfers)
		Transfers->push_back({&MI, UndefLocID});
}		}
}		}
OpenRanges.erase(KillSet, VarLocIDs);		OpenRanges.erase(KillSet, VarLocIDs);
}		}

// Try to recognise spill and restore instructions that may create a new		// Try to recognise spill and restore instructions that may create a new
// variable location.		// variable location.
if (isLocationSpill(MI, MF, Reg)) {		if (isLocationSpill(MI, MF, Reg)) {
Show All 29 Lines
}		}

/// If \p MI is a register copy instruction, that copies a previously tracked		/// If \p MI is a register copy instruction, that copies a previously tracked
/// value from one register to another register that is callee saved, we		/// value from one register to another register that is callee saved, we
/// create new DBG_VALUE instruction described with copy destination register.		/// create new DBG_VALUE instruction described with copy destination register.
void LiveDebugValues::transferRegisterCopy(MachineInstr &MI,		void LiveDebugValues::transferRegisterCopy(MachineInstr &MI,
OpenRangesSet &OpenRanges,		OpenRangesSet &OpenRanges,
VarLocMap &VarLocIDs,		VarLocMap &VarLocIDs,
TransferMap &Transfers) {		TransferMap *Transfers) {
const MachineOperand SrcRegOp, DestRegOp;		const MachineOperand SrcRegOp, DestRegOp;

if (!TII->isCopyInstr(MI, SrcRegOp, DestRegOp) \|\| !SrcRegOp->isKill() \|\|		if (!TII->isCopyInstr(MI, SrcRegOp, DestRegOp) \|\| !SrcRegOp->isKill() \|\|
!DestRegOp->isDef())		!DestRegOp->isDef())
return;		return;

auto isCalleSavedReg = [&](unsigned Reg) {		auto isCalleSavedReg = [&](unsigned Reg) {
for (MCRegAliasIterator RAI(Reg, TRI, true); RAI.isValid(); ++RAI)		for (MCRegAliasIterator RAI(Reg, TRI, true); RAI.isValid(); ++RAI)
▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	for (auto &ASeenFragment : AllSeenFragments) {
}		}
}		}

AllSeenFragments.insert(ThisFragment);		AllSeenFragments.insert(ThisFragment);
}		}

/// This routine creates OpenRanges and OutLocs.		/// This routine creates OpenRanges and OutLocs.
void LiveDebugValues::process(MachineInstr &MI, OpenRangesSet &OpenRanges,		void LiveDebugValues::process(MachineInstr &MI, OpenRangesSet &OpenRanges,
VarLocInMBB &OutLocs, VarLocMap &VarLocIDs,		VarLocMap &VarLocIDs, TransferMap *Transfers,
TransferMap &Transfers,
DebugParamMap &DebugEntryVals,		DebugParamMap &DebugEntryVals,
OverlapMap &OverlapFragments,		OverlapMap &OverlapFragments,
VarToFragments &SeenFragments) {		VarToFragments &SeenFragments) {
transferDebugValue(MI, OpenRanges, VarLocIDs);		transferDebugValue(MI, OpenRanges, VarLocIDs);
transferRegisterDef(MI, OpenRanges, VarLocIDs, Transfers,		transferRegisterDef(MI, OpenRanges, VarLocIDs, Transfers,
DebugEntryVals);		DebugEntryVals);
transferRegisterCopy(MI, OpenRanges, VarLocIDs, Transfers);		transferRegisterCopy(MI, OpenRanges, VarLocIDs, Transfers);
transferSpillOrRestoreInst(MI, OpenRanges, VarLocIDs, Transfers);		transferSpillOrRestoreInst(MI, OpenRanges, VarLocIDs, Transfers);
▲ Show 20 Lines • Show All 246 Lines • ▼ Show 20 Lines	while (!Worklist.empty()) {
if (MBBJoined) {		if (MBBJoined) {
MBBJoined = false;		MBBJoined = false;
Changed = true;		Changed = true;
// Now that we have started to extend ranges across BBs we need to		// Now that we have started to extend ranges across BBs we need to
// examine spill instructions to see whether they spill registers that		// examine spill instructions to see whether they spill registers that
// correspond to user variables.		// correspond to user variables.
// First load any pending inlocs.		// First load any pending inlocs.
OpenRanges.insertFromLocSet(PendingInLocs[MBB], VarLocIDs);		OpenRanges.insertFromLocSet(PendingInLocs[MBB], VarLocIDs);

		// Don't record transfers while propagating locations: they may be
		// be invalidated by future iterations.
for (auto &MI : *MBB)		for (auto &MI : *MBB)
process(MI, OpenRanges, OutLocs, VarLocIDs, Transfers,		process(MI, OpenRanges, VarLocIDs, nullptr, DebugEntryVals,
DebugEntryVals, OverlapFragments, SeenFragments);		OverlapFragments, SeenFragments);
OLChanged \|= transferTerminator(MBB, OpenRanges, OutLocs, VarLocIDs);		OLChanged \|= transferTerminator(MBB, OpenRanges, OutLocs, VarLocIDs);

LLVM_DEBUG(printVarLocInMBB(MF, OutLocs, VarLocIDs,		LLVM_DEBUG(printVarLocInMBB(MF, OutLocs, VarLocIDs,
"OutLocs after propagating", dbgs()));		"OutLocs after propagating", dbgs()));
LLVM_DEBUG(printVarLocInMBB(MF, InLocs, VarLocIDs,		LLVM_DEBUG(printVarLocInMBB(MF, InLocs, VarLocIDs,
"InLocs after propagating", dbgs()));		"InLocs after propagating", dbgs()));

if (OLChanged) {		if (OLChanged) {
OLChanged = false;		OLChanged = false;
for (auto s : MBB->successors())		for (auto s : MBB->successors())
if (OnPending.insert(s).second) {		if (OnPending.insert(s).second) {
Pending.push(BBToOrder[s]);		Pending.push(BBToOrder[s]);
}		}
}		}
}		}
}		}
Worklist.swap(Pending);		Worklist.swap(Pending);
// At this point, pending must be empty, since it was just the empty		// At this point, pending must be empty, since it was just the empty
// worklist		// worklist
assert(Pending.empty() && "Pending should be empty");		assert(Pending.empty() && "Pending should be empty");
}		}

		// Reprocess all instructions a final time and record transfers. The live-in
		// locations should not change as we've reached a fixedpoint.
		for (MachineBasicBlock &MBB : MF) {
		OpenRanges.insertFromLocSet(PendingInLocs[&MBB], VarLocIDs);
		for (auto &MI : MBB)
		process(MI, OpenRanges, VarLocIDs, &Transfers, DebugEntryVals,
		OverlapFragments, SeenFragments);
		OpenRanges.clear();
		}

// Add any DBG_VALUE instructions created by location transfers.		// Add any DBG_VALUE instructions created by location transfers.
for (auto &TR : Transfers) {		for (auto &TR : Transfers) {
MachineBasicBlock *MBB = TR.TransferInst->getParent();		MachineBasicBlock *MBB = TR.TransferInst->getParent();
const VarLoc &VL = VarLocIDs[TR.LocationID];		const VarLoc &VL = VarLocIDs[TR.LocationID];
MachineInstr *MI = VL.BuildDbgValue(MF);		MachineInstr *MI = VL.BuildDbgValue(MF);
MBB->insertAfterBundle(TR.TransferInst->getIterator(), MI);		MBB->insertAfterBundle(TR.TransferInst->getIterator(), MI);
}		}
Transfers.clear();		Transfers.clear();
Show All 30 Lines

test/DebugInfo/MIR/X86/live-debug-values-bad-transfer.mir

	# RUN: llc %s -mtriple=x86_64-unknown-unknown -o - -run-pass=livedebugvalues \| FileCheck %s --implicit-check-not=DBG_VALUE			# RUN: llc %s -mtriple=x86_64-unknown-unknown -o - -run-pass=livedebugvalues \| FileCheck %s --implicit-check-not=DBG_VALUE
	#			#
	# Test that the DBG_VALUE of ecx below does not get propagated. It is considered			# Test that the DBG_VALUE of ecx below does not get propagated. It is considered
	# live-in on LiveDebugValues' first pass through the loop, but on the second it			# live-in on LiveDebugValues' first pass through the loop, but on the second it
	# should be removed from the InLocs set because it gets clobbered inside the			# should be removed from the InLocs set because it gets clobbered inside the
	# loop. There should be no transfer from ecx to ebx -- this is ensured by the			# loop. There should be no transfer from ecx to ebx -- this is ensured by the
	# FileCheck implicit-check-not option.			# FileCheck implicit-check-not option.
	#			#
	# FIXME: we successfully prevent the false location (ebx) from being
	# propagated into block 2, but the original transfer isn't yet eliminated.
	# Thus we get no DBG_VALUe in block 2, but an invalid one in block 1.
	#
	# CHECK-LABEL: name: foo			# CHECK-LABEL: name: foo
	# CHECK-LABEL: bb.0.entry:			# CHECK-LABEL: bb.0.entry:
	# CHECK: $ecx = MOV32ri 0			# CHECK: $ecx = MOV32ri 0
	# CHECK-NEXT: DBG_VALUE			# CHECK-NEXT: DBG_VALUE
	# CHECK-LABEL: bb.1.loop:
	# CHECK: $ebx = COPY killed $ecx
	# CHECK-NEXT: DBG_VALUE
	jmorseAuthorUnsubmitted Done Reply Inline Actions I forgot to say in the summary: the absence of this incorrect DBG_VALUE is checked by the --implicit-check-not given to FileCheck. jmorse: I forgot to say in the summary: the absence of this incorrect DBG_VALUE is checked by the…

	--- \|			--- \|
	source_filename = "live-debug-values-remove-range.ll"			source_filename = "live-debug-values-remove-range.ll"
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	declare void @llvm.dbg.value(metadata, metadata, metadata)			declare void @llvm.dbg.value(metadata, metadata, metadata)

	define i32 @foo(i32 %bar) !dbg !4 {			define i32 @foo(i32 %bar) !dbg !4 {
	▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines