This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] MachineSink: Insert undef DBG_VALUEs when sinking instructions, try to forward copies
ClosedPublic

Authored by jmorse on Feb 14 2019, 7:56 AM.

Details

Summary

When we sink DBG_VALUEs at the moment, we simply move the DBG_VALUE instruction to below the sunk instruction. However, we should also mark the variable as being undef at the source location, to terminate any earlier variable location. This patch does that -- plus, if the instruction being sunk is a copy, it attempts to propagate the copy through the DBG_VALUE, replacing the destination with the source.

To avoid any kind of subregister shennanigans, vreg copy propagation only happens if all the subregisters agree; for physical registers we only propagate if the DBG_VALUE operand is the same as the destination of the copy. So:

%1 = COPY %0
DBG_VALUE %1

Would copy-prop, while

%1 = COPY %0
DBG_VALUE %1.subreg_8bit

Would not. Additional analysis might determine this to be safe, but I haven't implemented it here.

Likewise after regalloc:

$eax = COPY $ecx
DBG_VALUE $eax

Would constant-prop, while

$ax = $cx
DBG_VALUE $eax

Would not.

With this patch, building clang-3.4 yields a fractional number more number of covered variables, and roughly 0.5% more scope-bytes coverage.

Diff Detail

Event Timeline

jmorse created this revision.Feb 14 2019, 7:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 14 2019, 7:56 AM
aprantl added inline comments.Feb 14 2019, 8:41 AM
lib/CodeGen/MachineSink.cpp
798 ↗(On Diff #186844)

I may be misunderstanding, but if it isn't a vreg, don't we need to check that there are no defs of the reg in between? Or is this all pre-ra?

807 ↗(On Diff #186844)

Does this get easier to read if you unconditionally call setReg(0) and then overwrite it in the copy case?

jmorse updated this revision to Diff 186983.Feb 15 2019, 2:30 AM

Explicitly test for whether we're pre or post regalloc when deciding whether a copy can be propagated, refactor test logic.

jmorse marked 4 inline comments as done.Feb 15 2019, 2:37 AM
jmorse added inline comments.
lib/CodeGen/MachineSink.cpp
798 ↗(On Diff #186844)

Good question -- this code is called by both pre and post RA code. For post-ra it's guaranteed that there are no defs in between by the calling code: otherwise it would not be legal to sink the copy in the first place. For pre-ra we can rely on the SSA-ness of the function to guarantee validity.

However, I've been using the "both-are-vregs" and "both-are-physregs" tests as a proxy for whether we're pre or post RA, which isn't necessarily sound. I don't believe the pre-ra machine sinker will sink anything with meaningful physreg operations, but I've added an explicit test to make this more robust.

807 ↗(On Diff #186844)

It does, I've done that and simplified the tests into two (large) conditionals.

bjope added a comment.Feb 15 2019, 2:52 AM

I have some doubt about this. Mainly the part about inserting undefs. Although, some of my doubts are partially based on the big-black-hole regarding how debug info is supposed to work for heavily optimized code (and more specifically how this is supposed to work in LLVM based on what we got today).

I'll try to describe this by an example. Assume the C-code looks like this:

1:  x = y;
2:  z = z & q1;
3:  x = x +3;
4:  z = z | q2;

then we might end up MIR like this as input to MachineSink

%1 = LOAD ...
DBG_VALUE %1, %noreg, "x", ...
%2 = AND ...
%3 = ADD %1, 3
DBG_VALUE %3, %noreg, "x", ...
%4 = OR %2, ...

Now assume that the ADD is sunk (for simplicity in this example just to below the OR and not to another BB).

Isn't it perfectly OK to get

%1 = LOAD ...
DBG_VALUE %1, %noreg, "x", ...
%2 = AND ...
%4 = OR %2, ...
%3 = ADD %1, 3
DBG_VALUE %3, %noreg, "x", ...

making it appear as "x" is equal to "y" up until the ADD has been executed.

As I understand it this patch would introduce a DBG_VALUE saying that we do not know the value of "x" when doing the OR.

%1 = LOAD ...
DBG_VALUE %1, %noreg, "x", ...
%2 = AND ...
DBG_VALUE %noreg, %noreg, "x", ...
%4 = OR %2, ...
%3 = ADD %1, 3
DBG_VALUE %3, %noreg, "x", ...

But we have just delayed the ADD a little bit (so we have not really executed line 3 yet).

It might also be tempting to express the add in the DIExpression (a "salvage" kind of solution):

%1 = LOAD ...
DBG_VALUE %1, %noreg, "x", ...
%2 = AND ...
DBG_VALUE %1, %noreg, "x", !DIExpresssion("%1 + 3")
%4 = OR %2, ...
%3 = ADD %1, 3
DBG_VALUE %3, %noreg, "x", ...

I think the "salvage" kind of solution is incorrect here. We have not optimized away the add, it has just been moved. When debugging it would be confusing if "x" already has the value "y + 3" before executing the ADD (which in this example probably is the only instruction with line 3 as debug location).

The alternative of making "x" appear as optimized out is also weird IMO. We have not lost track of the value of "x". It first gets the value %1 and then after the ADD it has the value %3.

Maybe it is a philosophical question. Is the compiler scheduling/reordering source instructions (in the debugger it will appear as source statements are executed in a random order), or are we scheduling/reordering machine instructions (and in the debugger it still should appear as if we execute variable assignments in the order of the source code). VLIW-scheduling, tail merging, etc, of course make things extra complicated, since we basically will be all over the place all the time.

The "copy-prop" part of this patch might be OK. But couldn't it just be seen as a the same scenario as with the ADD above, where we are delaying the assignment?

(Sorry if my simplified examples doesn't make sense for the actual problem that you attempt to solve.)

Does it matter that we actually sink into a later BB here? Is this fixing some problem where it from a debugging perspective would appear wrong if the variable does not get the new value before the end of the BB?

Hi,

I have some doubt about this. Mainly the part about inserting undefs. Although, some of my doubts are partially based on the big-black-hole regarding how debug info is supposed to work for heavily optimized code (and more specifically how this is supposed to work in LLVM based on what we got today).

No problemo -- my debuginfo knowledge is pretty fresh anyway, most of this is based on educated guesses of how it's supposed to work.

My understanding is that the difficulty comes from the re-ordering of DBG_VALUEs, because it re-orders how variable assignments appear. This can lead to debuggers displaying states of the program (i.e. a set of assignments to variables) that did not exist in the original program. If we take the example and put in DBG_VALUE instructions for the 'z' variable, and then sink the addition as you describe, we get:

%1 = LOAD ...
DBG_VALUE %1, %noreg, "x", ...
%2 = AND ...
DBG_VALUE %2, %noreg, "z", ...
%4 = OR %2, ...
DBG_VALUE %4, %noreg, "z", ...
%3 = ADD %1, 3
DBG_VALUE %3, %noreg, "x", ...

If a debugger landed after the OR instruction (before the sunk ADD is executed) then it would observe the value of "y" from line 4, but the value of "x" from line 1, which was overwritten in the original program. This is a state that didn't originally exist -- and it has the potential to mislead developers, for example if iterating over pairs of something and the two values displayed aren't actually paired.

For the exact example you give where there *isn't* any re-ordering, then we're probably needlessly reducing the range where we have variable locations, however that's an exception to a general rule IMHO (and this patch is implementing the general rule).

One response to reordering would be "Yeah, that's going to happen when your code is optimised", which is totally true. IMHO having variables read "optimized out" is slightly superior because it doesn't represent a nonexistant state of the program. That might force developers to dig deeper to find out what's going on (assuming they can step the debugger), but on the other hand they're guaranteed that when there is a variable location, it's accurate. Otherwise developers might not fully trust the debuginfo they're getting.

Maybe it is a philosophical question. Is the compiler scheduling/reordering source instructions (in the debugger it will appear as source statements are executed in a random order), or are we scheduling/reordering machine instructions (and in the debugger it still should appear as if we execute variable assignments in the order of the source code). VLIW-scheduling, tail merging, etc, of course make things extra complicated, since we basically will be all over the place all the time.

To me it's the latter, my mental model (maybe wrong) is that while the compiler is optimising code, for debuginfo it has a state-machine of variable locations represented by dbg.values that it needs to try and preserve the order of, marking locations undef if they don't exist any more. The logical consequence is that if we had a function where the compiler managed to completely reverse the order of computation, no variable would have a location, which would be sound, but not useful.

Does it matter that we actually sink into a later BB here? Is this fixing some problem where it from a debugging perspective would appear wrong if the variable does not get the new value before the end of the BB?

The sinking is irrelevant to the question I think, in that it's a general question of "should re-ordered variable assignments be visible?".

NikolaPrica added inline comments.
lib/CodeGen/MachineSink.cpp
788 ↗(On Diff #186983)

TargetInstrInfo::isCopyInstr here? It should support pseudo COPY instruction and target specific register copy instructions.

jmorse updated this revision to Diff 187771.Feb 21 2019, 5:18 AM

Use isCopyInstr to detect copy instructions, which will catch more opportunities that just isCopy().

jmorse marked 2 inline comments as done.Feb 21 2019, 5:18 AM
jmorse added inline comments.
lib/CodeGen/MachineSink.cpp
788 ↗(On Diff #186983)

Sounds good, updated test to call isCopyInstr.

So this basically prevents the debugger from displaying an inconsistent program state when some of the sideeffects from a line preceding its current stopping point have not been completed yet. Makes sense to me. One minor concern: If the sunk instruction does not get sunk past the next source line, we may create some unneeded location list entries, e.g.:

x = y; x = x + 3; z = z ? z + 3 : b; // all on the same line

If the x + 3 is sunk past the assignment to z (but before any code attributed to the next line is executed) we generate an additional location list entry for x, where there is a small gap in the covered range for x, even though the user is not able to observe x until the next line.
Doesn't look like a big deal, though.

So this basically prevents the debugger from displaying an inconsistent program state when some of the sideeffects from a line preceding its current stopping point have not been completed yet. Makes sense to me. One minor concern: If the sunk instruction does not get sunk past the next source line, we may create some unneeded location list entries, e.g.:

x = y; x = x + 3; z = z ? z + 3 : b; // all on the same line

If the x + 3 is sunk past the assignment to z (but before any code attributed to the next line is executed) we generate an additional location list entry for x, where there is a small gap in the covered range for x, even though the user is not able to observe x until the next line.
Doesn't look like a big deal, though.

Still nice/potentially worthwhile to have the location accuracy on a per-instruction basis. You could imagine if + was an operator overload and you broke inside that, went up a frame and tried ot print the variable's value - you're at the '+' (or assignment, etc) exactly, within a line, so there's still value in having the location be accurate on that sub-line granularity.

bjope added a comment.Feb 22 2019, 1:56 AM

Maybe it is a philosophical question. Is the compiler scheduling/reordering source instructions (in the debugger it will appear as source statements are executed in a random order), or are we scheduling/reordering machine instructions (and in the debugger it still should appear as if we execute variable assignments in the order of the source code). VLIW-scheduling, tail merging, etc, of course make things extra complicated, since we basically will be all over the place all the time.

To me it's the latter, my mental model (maybe wrong) is that while the compiler is optimising code, for debuginfo it has a state-machine of variable locations represented by dbg.values that it needs to try and preserve the order of, marking locations undef if they don't exist any more. The logical consequence is that if we had a function where the compiler managed to completely reverse the order of computation, no variable would have a location, which would be sound, but not useful.

So then we should check if there is any other dbg.value that we sink past? Otherwise there won't be any reordering and no need for an undef location?
It might be that we sink past instructions that has been sunk earlier (or simply been re-scheduled at ISel etc), so it could be that we restore the source order when sinking. Or we could be sinking past some constant materialisation that has no associated source line. Etc.

After ISel we could have something like this:

%1 = LOAD ...
DBG_VALUE %1, %noreg, "x", ...
%2 = AND ...
%3 = ADD %1, 3
DBG_VALUE %3, %noreg, "x", ...
%5 = SUB 0, 0

or we could just as well have the SUB before the ADD afaict

%1 = LOAD ...
DBG_VALUE %1, %noreg, "x", ...
%2 = AND ...
%5 = SUB 0, 0
%3 = ADD %1, 3
DBG_VALUE %3, %noreg, "x", ...

Sinking the ADD past the SUB would introduce an undef location for "x", when done by MachineSink. But not if the instructions were emitted in that order already at ISel.
So minor changes in ISel-scheduling could ripple down, impacting how many undef DBG_VALUE we see after MachineSink. It is not like ISel care that much about where it inserts this SUB afaict.

When for example regalloc inserts spill/reload code, I guess that it in some sense can be seen as the spill is hoisted from the reload, so in effect it sinks lots of instructions past the spilling instruction. But it does not seem correct to sprinkle undef DBG_VALUE instructions all over the code, in such situations.

Anyway, I'm still a little bit confused, and haven't really understood the full consequence of this.
Just got a feeling that some criterias for when to do this is missing, but maybe the aptch just takes a defensive appoach. We do not really need to insert the undef location always, but we need to do it sometimes. So it is better to do it too often compared to too seldom (as having an undef location always is OK).

jmorse planned changes to this revision.Feb 22 2019, 5:42 AM
jmorse marked an inline comment as done.

So then we should check if there is any other dbg.value that we sink past? Otherwise there won't be any reordering and no need for an undef location?

True -- I'd previously been ignoring this issue as I thought it'd require another re-scan of instructions we sink past, but as the calling code walks through blocks backwards, it should be easy to maintain a little extra state recording what's been seen. (I might roll this into the parent patch too).

When for example regalloc inserts spill/reload code, I guess that it in some sense can be seen as the spill is hoisted from the reload, so in effect it sinks lots of instructions past the spilling instruction. But it does not seem correct to sprinkle undef DBG_VALUE instructions all over the code, in such situations.

I don't completely follow this, a vreg that gets spilt-and-reloaded has a location even when it's spilt surely? (How livedebugvalues and debug locations for spilt values works isn't something I've looked into).

Anyway, I'm still a little bit confused, and haven't really understood the full consequence of this.
Just got a feeling that some criterias for when to do this is missing, but maybe the aptch just takes a defensive appoach. We do not really need to insert the undef location always, but we need to do it sometimes. So it is better to do it too often compared to too seldom (as having an undef location always is OK).

It's definitely a defensive approach, as covered above I tend to prioritise not allowing unsound variable locations, even when we could be more complete.

jmorse added a comment.EditedMar 7 2019, 3:47 AM

I produced another patch (D59027, work-in-progress) that only creates undef DBG_VALUEs when the order of assignments would change... however I then realised I'd misunderstood Bjorns question here:

It might be that we sink past instructions that has been sunk earlier (or simply been re-scheduled at ISel etc), so it could be that we restore the source order when sinking. Or we could be sinking past some constant materialisation that has no associated source line. Etc.

After ISel we could have something like this:

%1 = LOAD ...
DBG_VALUE %1, %noreg, "x", ...
%2 = AND ...
%3 = ADD %1, 3
DBG_VALUE %3, %noreg, "x", ...
%5 = SUB 0, 0

or we could just as well have the SUB before the ADD afaict

%1 = LOAD ...
DBG_VALUE %1, %noreg, "x", ...
%2 = AND ...
%5 = SUB 0, 0
%3 = ADD %1, 3
DBG_VALUE %3, %noreg, "x", ...

Sinking the ADD past the SUB would introduce an undef location for "x", when done by MachineSink. But not if the instructions were emitted in that order already at ISel.

If I understand you correctly, this actually isn't a case that MachineSink deals with: MachineSink sinks insts from parent to child basic blocks, to remove partially redundant computations. Take this example [0], where the load of %ab is only used down one path, MachineSink moves it to only execute in the block which actually uses the load. With that in mind it might be clearer why undefs become necessary: if the DBG_VALUE moves to a different block, one of the source-level assignments to the corresponding variable has disappeared from other paths through the program. Those paths should have the variable appear ``optimised out'' until the variable gets a new location, to stop the old location being used on the other paths which would present a stale variable value (effectively this re-orders the appearance of assignments).

Using the example quoted above, if ADD were sunk into a _successor_ block, then an undef DBG_VALUE would have to go somewhere, to terminate the earlier location of "x" in the other successor blocks. (NB, MachineSink doesn't operate if there's only one successor).

[0] https://github.com/llvm-mirror/llvm/blob/master/test/CodeGen/X86/MachineSink-DbgValue.ll

bjope added a comment.Mar 7 2019, 4:08 PM

I produced another patch (D59027, work-in-progress) that only creates undef DBG_VALUEs when the order of assignments would change... however I then realised I'd misunderstood Bjorns question here:

It might be that we sink past instructions that has been sunk earlier (or simply been re-scheduled at ISel etc), so it could be that we restore the source order when sinking. Or we could be sinking past some constant materialisation that has no associated source line. Etc.

After ISel we could have something like this:

%1 = LOAD ...
DBG_VALUE %1, %noreg, "x", ...
%2 = AND ...
%3 = ADD %1, 3
DBG_VALUE %3, %noreg, "x", ...
%5 = SUB 0, 0

or we could just as well have the SUB before the ADD afaict

%1 = LOAD ...
DBG_VALUE %1, %noreg, "x", ...
%2 = AND ...
%5 = SUB 0, 0
%3 = ADD %1, 3
DBG_VALUE %3, %noreg, "x", ...

Sinking the ADD past the SUB would introduce an undef location for "x", when done by MachineSink. But not if the instructions were emitted in that order already at ISel.

If I understand you correctly, this actually isn't a case that MachineSink deals with: MachineSink sinks insts from parent to child basic blocks, to remove partially redundant computations. Take this example [0], where the load of %ab is only used down one path, MachineSink moves it to only execute in the block which actually uses the load. With that in mind it might be clearer why undefs become necessary: if the DBG_VALUE moves to a different block, one of the source-level assignments to the corresponding variable has disappeared from other paths through the program. Those paths should have the variable appear ``optimised out'' until the variable gets a new location, to stop the old location being used on the other paths which would present a stale variable value (effectively this re-orders the appearance of assignments).

Using the example quoted above, if ADD were sunk into a _successor_ block, then an undef DBG_VALUE would have to go somewhere, to terminate the earlier location of "x" in the other successor blocks. (NB, MachineSink doesn't operate if there's only one successor).

[0] https://github.com/llvm-mirror/llvm/blob/master/test/CodeGen/X86/MachineSink-DbgValue.ll

Yes, I was just thinking out loud about sinking/scheduling in general and not so much the specific situation with MachineSink. But I also wondered if the sinking done by MachineSink could be seen as a special case of sinking in general, where we sink one instruction at a time. For MachineSink we eventually reach the end of the BB, and then continue into a successor. So how do we determine when it is time to insert "undef" when sinking one instruction at a time? What kind of reorderings should/shouldn't trigger that we insert an "undef"? I guess this is one thing we should try to describe in the documentation (also for DGB_VALUE), to make sure new developers understand the basic logic behind how we implement these things.

I only had a quick look at D59027, but it feels like it tries to handle this a little bit more "carefully" (avoiding <optimized out> in some situations compared to this patch).
Btw, I'll be OOO for a week. So I won't be able to give much more feedback right now. No need to wait for my approval if you get LGTM from someone else while I'm away.

ormris removed a subscriber: ormris.Mar 7 2019, 4:11 PM
jmorse requested review of this revision.Mar 8 2019, 2:49 AM

Yes, I was just thinking out loud about sinking/scheduling in general and not so much the specific situation with MachineSink. But I also wondered if the sinking done by MachineSink could be seen as a special case of sinking in general, where we sink one instruction at a time. For MachineSink we eventually reach the end of the BB, and then continue into a successor. So how do we determine when it is time to insert "undef" when sinking one instruction at a time? What kind of reorderings should/shouldn't trigger that we insert an "undef"? I guess this is one thing we should try to describe in the documentation (also for DGB_VALUE), to make sure new developers understand the basic logic behind how we implement these things.

Ahhh, now that all fits in my mind, cool. I'll certainly ship a docs patch, when I've convinced myself I know what's going on at the CodeGen level :o

I only had a quick look at D59027, but it feels like it tries to handle this a little bit more "carefully" (avoiding <optimized out> in some situations compared to this patch).
Btw, I'll be OOO for a week. So I won't be able to give much more feedback right now. No need to wait for my approval if you get LGTM from someone else while I'm away.

Cool, now clicking a phab button that allegedly will put this back in for review.

vsk added inline comments.Jul 10 2019, 2:07 PM
lib/CodeGen/MachineSink.cpp
801 ↗(On Diff #187771)

This subreg equality check seems a bit tricky. Mind breaking it up? Currently it reads like '(true|false) == DstMO->getSubReg()', which is surprising because I expected an unsigned-unsigned comparison.

jmorse updated this revision to Diff 209235.Jul 11 2019, 8:52 AM

Split up subregister comparision into pairs, to make it clear we're comparing unsigneds.

While we're here, update JCC insts in the test, and remove some un-necessary attributes.

jmorse marked an inline comment as done.Jul 11 2019, 8:53 AM
aprantl added inline comments.Jul 29 2019, 1:30 PM
lib/CodeGen/MachineSink.cpp
808 ↗(On Diff #209235)

There ought to be some simplification of these two conditions possible. Perhaps adding a continue?

jmorse updated this revision to Diff 218349.Sep 2 2019, 6:12 AM

This update simplifies & flattens the copy-forwarding logic as suggested. Rather than trying to select the condition where copy-forwarding is valid, instead continue around the loop whenever a precondition isn't met.

This saves us a level of indentation and makes the conditionals much easier to understand.

aprantl added inline comments.Sep 3 2019, 9:05 AM
lib/CodeGen/MachineSink.cpp
812 ↗(On Diff #218349)

nit: else after continue is redundant.

jmorse updated this revision to Diff 218716.Sep 4 2019, 8:25 AM

Nix an un-necessary else.

jmorse marked 4 inline comments as done.Sep 4 2019, 8:25 AM

Ping -- I think this is more or less accepted, but I'd feel happier if this patch was green.

aprantl accepted this revision.Oct 24 2019, 2:47 PM

I don't want to preempt any of the other reviewers, but from my point I think this looks good.

lib/CodeGen/MachineSink.cpp
774 ↗(On Diff #218716)

Not your code, but this looks like it could be replaced by a range-based-for.

This revision is now accepted and ready to land.Oct 24 2019, 2:47 PM
This revision was automatically updated to reflect the committed changes.

Hi! This patch seems to drastically increase compile times for some fuchsia builds. I'm working on creating a small reproducer that I can share, but just wanted to raise awareness in the meantime.

Is the extra cost here, or is it more work being generated for LiveDebugValues?

Is the extra cost here, or is it more work being generated for LiveDebugValues?

I'm not sure what under the hood is causing this longer compile time. The most I can tell from a time trace is that a lot of the time is taken up in the backend. I've filed https://bugs.llvm.org/show_bug.cgi?id=43855 to track this. @jmorse could you look into this when you get the chance? Thanks.