Page MenuHomePhabricator

[FastISel] Flush local value map on every instruction
ClosedPublic

Authored by probinson on Nov 18 2020, 11:45 AM.

Details

Summary

Local values are constants or addresses that can't be folded into
the instruction that uses them. FastISel materializes these in a
"local value" area that always dominates the current insertion
point, to try to avoid materializing these values more than once
(per block).

https://reviews.llvm.org/D43093 added code to sink these local
value instructions to their first use, which has two beneficial
effects. One, it is likely to avoid some unnecessary spills and
reloads; two, it allows us to attach the debug location of the
user to the local value instruction. The latter effect can
improve the debugging experience for debuggers with a "set next
statement" feature, such as the Visual Studio debugger and PS4
debugger, because instructions to set up constants for a given
statement will be associated with the appropriate source line.

There are also some constants (primarily addresses) that could be
produced by no-op casts or GEP instructions; the main difference
from "local value" instructions is that these are values from
separate IR instructions, and therefore could have multiple users
across multiple basic blocks. D43093 avoided sinking these, even
though they were emitted to the same "local value" area as the
other instructions. The patch comment for D43093 states:

Local values may also be used by no-op casts, which adds the
register to the RegFixups table. Without reversing the RegFixups
map direction, we don't have enough information to sink these
instructions.

This patch undoes most of D43093, and instead flushes the local
value map after(*) every IR instruction, using that instruction's
debug location. This avoids sometimes incorrect locations used
previously, and emits instructions in a more natural order.

In addition, constants materialized due to PHI instructions are
not assigned a debug location immediately; instead, when th
local value map is flushed, if the first local value instruction
has no debug location, it is given the same location as the
first non-local-value-map instruction. This prevents PHIs
from introducing unattributed instructions, which would either
be implicitly attributed to the location for the preceding IR
instruction, or given line 0 if they are at the beginning of
a machine basic block. Neither of those consequences is good
for debugging.

This does mean materialized values are not re-used across IR
instruction boundaries; however, only about 5% of those values
were reused in an experimental self-build of clang.

(*) Actually, just prior to the next instruction. It seems like
it would be cleaner the other way, but I was having trouble
getting that to work.

There is additional cleanup to be done after the functional
patch; the SinkLocalValues option can go away, as well as
the LastFlushPoint member. I figured the functional patch
is big enough as it is, and these NFC follow-ups would be easy.
be

Diff Detail

Event Timeline

probinson created this revision.Nov 18 2020, 11:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 18 2020, 11:45 AM

Dammit. Visual Studio keeps running clang-format on me. I thought I had that turned off and fixed up the formatting but apparently not. I'll make sure those bits aren't in the next round.

rnk added a subscriber: MatzeB.Nov 18 2020, 1:03 PM

Thanks! It looks like the presubmit/Harbormaster testing found some more test failures that need addressing.

llvm/lib/CodeGen/SelectionDAG/FastISel.cpp
223

This dead local value removal logic was part of D43093, right? At least, that's what I remember. Anyway, it's pretty simple, so I agree we should keep it.

llvm/test/CodeGen/Mips/emergency-spill-slot-near-fp.ll
5–6

Huh. @MatzeB, do you think this test case can be repaired?

llvm/test/DebugInfo/X86/fission-ranges.ll
44–45

Well there's a fragile test. :(

probinson updated this revision to Diff 306208.Nov 18 2020, 1:30 PM

Some sort of pilot error on that first diff. This should be better.

llvm/lib/CodeGen/SelectionDAG/FastISel.cpp
223

Yes, this is modeled on that code. It definitely catches things.

I was trying to use removeDeadLocalValueCode but it caught too much, so I went back to this.

FTR, I plan to remove the -fast-isel-sink-local-instructions option. I think there might be some now-unused fields as well, such as LastFlushPoint, that could be removed. These are NFC follow-ups that can be deferred, this patch is big enough as it is.

In D91734#2403719, @rnk wrote:

Thanks! It looks like the presubmit/Harbormaster testing found some more test failures that need addressing.

AArch64/arm64-fast-isel.ll was due to the pilot error in generating the original diff.
lld/debug-removed-fn.ll, sorry I only ran check-llvm! One address changed, I'll fix that shortly.
The HWAsan one... I don't normally run sanitizers, so I'm not clear on how to do that :) might take a little while to repro.

Fix an lld test and two lldb tests.

I don't reproduce the HWASan failure on my Linux.

rnk accepted this revision.Nov 23 2020, 2:56 PM

I think the code looks good, but I don't want to create an orphaned test case for emergency spill slot creation. Please file a bug about it, reference it from the comments, and go ahead and land it.

I filed a bug about the hwasan test failure, and Evgeniy is looking into it:
https://bugs.llvm.org/show_bug.cgi?id=48274

llvm/test/CodeGen/Mips/emergency-spill-slot-near-fp.ll
5–6

Given that Matthias hasn't responded, please file a bug about this, assign it to him, reference it here, and we'll commit it like that.

This revision is now accepted and ready to land.Nov 23 2020, 2:56 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptNov 25 2020, 10:05 AM

This causes several regressions in the gdb test suite we're running internally - the failures are:

FAIL: gdb.base/break.exp: breakpoint at start of multi line while conditional
FAIL: gdb.base/break.exp: breakpoint info
FAIL: gdb.base/foll-exec.exp: step through execlp call
FAIL: gdb.base/foll-exec.exp: step after execlp call
FAIL: gdb.base/foll-exec.exp: print execd-program/global_i (after execlp)
FAIL: gdb.base/foll-exec.exp: print execd-program/local_j (after execlp)
FAIL: gdb.base/foll-exec.exp: print follow-exec/local_k (after execlp)
FAIL: gdb.base/hbreak2.exp: hardware breakpoint at start of multi line while conditional
FAIL: gdb.base/hbreak2.exp: hardware breakpoint info

I looked into the first one in break.exp and reduced it a bit, given code like this:

int main() { }
int f1(int a, int b) {
  return a && b;
}

debugging this binary with gdb and running b f1 results in breaking on an instruction with no line information after this change, eg:
old behavior:

(gdb) b f1
Breakpoint 1 at 0x40112c: file test.c, line 2.

new behavior:

(gdb) b f1
Breakpoint 1 at 0x40112a

The assembly difference:

-       xorl    %eax, %eax
-                                        # kill: def $al killed $al killed $eax
        movl    %edi, -4(%rbp)
        movl    %esi, -8(%rbp)
 .Ltmp2:
-       .loc    1 3 10 prologue_end             # break.c:3:10
+       .loc    1 0 0 prologue_end              # break.c:0:0
+       xorl    %eax, %eax
+                                        # kill: def $al killed $al killed $eax
+       .loc    1 3 10                          # break.c:3:10
        cmpl    $0, -4(%rbp)
        movb    %al, -9(%rbp)                   # 1-byte Spill

So it looks like the && caused a location-less zero constant to be synthesized and that previously floated up into the prologue - but now it's pulled down past the prologue begin, so after the prologue this location-less zero is the first location.

I've reverted this patch (& dependent/follow-on patches) in 615f63e149f31d6a97b5233b4fe634db92e19aa9 - happy to help with further reproductions/validations (if/when you've got a proposed fix, if you can't run the gdb test suite or otherwise reproduce the failures, I can run our internal infrastructure to validate again)

rnk added a comment.Dec 1 2020, 2:55 PM

I see. We should give that constant materialization a location. It looks like it is coming from a phi node. The IR looks like this:

  %6 = icmp ne i32 %5, 0, !dbg !11
  br i1 %6, label %7, label %10, !dbg !12

7:                                                ; preds = %2
  %8 = load i32, i32* %4, align 4, !dbg !13
  %9 = icmp ne i32 %8, 0, !dbg !13
  br label %10

10:                                               ; preds = %7, %2
  %11 = phi i1 [ false, %2 ], [ %9, %7 ], !dbg !14   ;;;; Probably where the zero comes from
  %12 = zext i1 %11 to i32, !dbg !11
  ret i32 %12, !dbg !15

The PHI node has location !14, which is a line 0 location. Is there a reason we give this PHI a line 0 location, when it's built by the frontend for the conditional operator? IMO we should use the location of the br instruction, which will be the location of the conditional operator (&& at the source level).

Paul has already shown that flushing the local value map improves debug info quality in general. If we can't fix all the gdb test suite failures, IMO we should consider XFAILing them and moving on.

The constant materialization (xorl) obviously does not belong in the prologue; I think the instruction ordering with the patch is preferable.
But, clearly we'd rather not have it attributed to line 0, but the line it actually belongs to. I'll start looking into this. Offhand the reduced example appears to be not unreasonably large :-)

We do have a gdb suite of some form running on our downstream compiler, but this patch series hasn't made it there yet. When I put a new patch up I'll tag you @dblaikie and you can try it out on your bot.

@rnk is correct that the phi's source location is propagating to the xorl. I modified the .ll file to put a different line number on the phi, and the xorl picked it up. This is clearly on the front-end to get right.

If we want to get down to cases, there are other poor choices of source location in this tiny example. The statement source is

return a && b;

The 'a' is at column 10, the '&&' at 12, the 'b' at 15.
Now consider the load/test of 'a':

  %0 = load i32, i32* %a.addr, align 4, !dbg !19
  %tobool = icmp ne i32 %0, 0, !dbg !19
  br i1 %tobool, label %land.rhs, label %land.end, !dbg !20
...
!19 = !DILocation(line: 3, column: 10, scope: !12)
!20 = !DILocation(line: 3, column: 12, scope: !12)

The load and icmp both use !19, the source location of 'a' in the expression; the br uses !20, the '&&'. All seems good. Now consider the 'b' part:

  %1 = load i32, i32* %b.addr, align 4, !dbg !21
  %tobool1 = icmp ne i32 %1, 0, !dbg !20
  br label %land.end
...
!20 = !DILocation(line: 3, column: 12, scope: !12)
!21 = !DILocation(line: 3, column: 15, scope: !12)

The load uses !21, 'b', but the icmp points to the '&&', and the br points nowhere. This is at least inconsistent with how the LHS of the expression is handled, and I would argue it's actually wrong.

This gets reflected in the generated code:

	.loc	1 0 0 prologue_end              # gdb-fail.c:0:0
	xorl	%eax, %eax                      ;;;; because of the phi's line-0
                                        # kill: def $al killed $al killed $eax
	.loc	1 3 10                          # gdb-fail.c:3:10
	cmpl	$0, -4(%rbp)                    ;;;; load/compare of 'a' at column 10
	movb	%al, -9(%rbp)                   # 1-byte Spill
	.loc	1 3 12 is_stmt 0                # gdb-fail.c:3:12
	je	.LBB1_2                         ;;;; the je attributed to the &&
# %bb.1:                                # %land.rhs
	cmpl	$0, -8(%rbp)                    ;;;; load/compare of 'b' at column 12 (??) not 15
	setne	%al
	movb	%al, -9(%rbp)                   # 1-byte Spill
.LBB1_2:                                # %land.end
	.loc	1 0 12                          # gdb-fail.c:0:12
	movb	-9(%rbp), %al                   # 1-byte Reload

I'd also be happier if the reload was attributed to the same source location as the following instructions, but that's probably a whole 'nother can of worms.

Just for grins, change the && to || and see what happens...
The xorl becomes a movb $1 (no surprise there). But, that instruction no longer has line-0, instead it becomes part of the prologue.
This tells me that the xorl had an explicit line 0, while the 'movb $1' has no location (a subtle but important difference).

There are also curious differences between the CGExprScalar.cpp methods VisitBinLAnd() and VisitBinLOr(); the former is more attentive to line attributions than the latter, which seems likely to be an oversight dating back a decade or so.

Onward into the breach.

In D91734#2426897, @rnk wrote:

I see. We should give that constant materialization a location.

Yeah, likely - if this patch makes constant materialization local to the IR instruction - if there's a reasonable location the IR instruction should have, that seems fair to me.

(a more general fix (that would cover cases where the instruction really has no location) might be to propagate locations from the first instruction in a basic block with a location back up to the start of the basic block - I forget if we've considered/tried that before)

It looks like it is coming from a phi node. The IR looks like this:

  %6 = icmp ne i32 %5, 0, !dbg !11
  br i1 %6, label %7, label %10, !dbg !12

7:                                                ; preds = %2
  %8 = load i32, i32* %4, align 4, !dbg !13
  %9 = icmp ne i32 %8, 0, !dbg !13
  br label %10

10:                                               ; preds = %7, %2
  %11 = phi i1 [ false, %2 ], [ %9, %7 ], !dbg !14   ;;;; Probably where the zero comes from
  %12 = zext i1 %11 to i32, !dbg !11
  ret i32 %12, !dbg !15

The PHI node has location !14, which is a line 0 location. Is there a reason we give this PHI a line 0 location, when it's built by the frontend for the conditional operator? IMO we should use the location of the br instruction, which will be the location of the conditional operator (&& at the source level).

Paul has already shown that flushing the local value map improves debug info quality in general. If we can't fix all the gdb test suite failures, IMO we should consider XFAILing them and moving on.

For sure there's tipping points where one bug is worth another - but generally it's best to avoid that if possible. (& a sliding scale of "can be fixed" - matter of code, complexity, etc - yeah, if the alternative is rewriting big parts of LLVM to avoid regressing one thing to progress here - I'm with you, there's some place where we'd tradeoff some bugs for some other bugs)

Just for grins, change the && to || and see what happens...
The xorl becomes a movb $1 (no surprise there). But, that instruction no longer has line-0, instead it becomes part of the prologue.
This tells me that the xorl had an explicit line 0, while the 'movb $1' has no location (a subtle but important difference).

There are also curious differences between the CGExprScalar.cpp methods VisitBinLAnd() and VisitBinLOr(); the former is more attentive to line attributions than the latter, which seems likely to be an oversight dating back a decade or so.

Onward into the breach.

I did some work years back to try to streamline the location of instructions when code generating expressions in clang. Mostly centered around the creation and use of ApplyDebugLocation scoped type. Which you can see used in various places, but for instance, at the start of CodeGenFunction::EmitLValue(const Expr *E) - the idea being that all instructions related to code generating that expression would have a location the same as the preferred location (the same place clang points to in error messages) of the expression - and any subexpression would get a similar treatment, overriding the outer expression's location in turn.

Let's see what's up with VisitBinLAnd and VisitBinLOr... Ah, you're referring to the use of ApplyDebugLocation around the two EmitBlock calls in VisitBinLAnd that are missing in VisitBinLOr? Yeah, looks like an omission on my part back when - I vaguely remember that whole "unconditional branches shouldn't have locations" thing, but hopefully there's enough info in the commit messages to help explain why it's useful as I don't think I have that much context anymore.

Right you are about line 0 V no location - the phi after the && has line 0, but the phi after the || has no line.

Here's where the line 0 for the Phi is coming from: https://github.com/llvm-mirror/clang/blob/master/lib/CodeGen/CGExprScalar.cpp#L4039 - I was going to blame myself, but seems to have come from here: https://reviews.llvm.org/D47720

This patch might be undoing, awkwardly, the point of my original patch of omitting locations on unconditional jumps... Hmm, nope, that's just about the jumps.

Looks like this Phi creation ( https://github.com/llvm-mirror/clang/blob/master/lib/CodeGen/CGExprScalar.cpp#L4012 ) misses the opportunity to get the location from the expression, because it's not created by the IRBuilder - perhaps it should be, rather than being given an artificial location - we know which expression it's coming from, just as much as we know the instruction for the loads, etc. So maybe that code should be ApplyDebugLocation::CreateArtificial and scope, and instead set the location to the current debug location - probably move it up to where the Phi is being constructed too - I wouldn't've thought there was a need to move it away from that point.

Sometihng like this seems plausible to me:

$ git diff
diff --git clang/lib/CodeGen/CGExprScalar.cpp clang/lib/CodeGen/CGExprScalar.cpp
index c906af8a4afa..c85ce46508a6 100644
--- clang/lib/CodeGen/CGExprScalar.cpp
+++ clang/lib/CodeGen/CGExprScalar.cpp
@@ -4187,6 +4187,7 @@ Value *ScalarExprEmitter::VisitBinLAnd(const BinaryOperator *E) {
   // setting up the PHI node in the Cont Block for this.
   llvm::PHINode *PN = llvm::PHINode::Create(llvm::Type::getInt1Ty(VMContext), 2,
                                             "", ContBlock);
+  PN->setDebugLoc(Builder.getCurrentDebugLocation());
   for (llvm::pred_iterator PI = pred_begin(ContBlock), PE = pred_end(ContBlock);
        PI != PE; ++PI)
     PN->addIncoming(llvm::ConstantInt::getFalse(VMContext), *PI);
@@ -4209,12 +4210,6 @@ Value *ScalarExprEmitter::VisitBinLAnd(const BinaryOperator *E) {
   // Insert an entry into the phi node for the edge with the value of RHSCond.
   PN->addIncoming(RHSCond, RHSBlock);

-  // Artificial location to preserve the scope information
-  {
-    auto NL = ApplyDebugLocation::CreateArtificial(CGF);
-    PN->setDebugLoc(Builder.getCurrentDebugLocation());
-  }
-
   // ZExt result to int.
   return Builder.CreateZExtOrBitCast(PN, ResTy, "land.ext");
 }
diff --git clang/test/CodeGen/debug-info-inline-for.c clang/test/CodeGen/debug-info-inline-for.c
index 55066b28a1ff..5d1b1e7fc3bd 100644
--- clang/test/CodeGen/debug-info-inline-for.c
+++ clang/test/CodeGen/debug-info-inline-for.c
@@ -10,4 +10,4 @@ int func(int n) {
 // CHECK: land.end:
 // CHECK-NEXT: {{.*}} = phi i1 {{.*}} !dbg ![[DbgLoc:[0-9]+]]

-// CHECK: ![[DbgLoc]] = !DILocation(line: 0
+// CHECK: ![[DbgLoc]] = !DILocation(line: 6, column: 19

Not sure if it addresses all of the regressions though.

Sometihng like this seems plausible to me:

Yes, I was playing with essentially that exact patch last night. It has no effect on the final assembly on its own, but combined with my patch it does.

(a more general fix (that would cover cases where the instruction really has no location) might be to propagate locations from the first instruction in a basic block with a location back up to the start of the basic block - I forget if we've considered/tried that before)

We have, but that without flushing the map on every instruction, so it caught materialization instructions that didn't actually belong to that IR instruction. The tactic would likely be more reasonable in conjunction with my patch.

Sometihng like this seems plausible to me:

Yes, I was playing with essentially that exact patch last night. It has no effect on the final assembly on its own, but combined with my patch it does.

It might have effects on assembly in other test cases, though. Could be worth running it through a self-host or something to see what other changes it causes and whether they're desirable.

(a more general fix (that would cover cases where the instruction really has no location) might be to propagate locations from the first instruction in a basic block with a location back up to the start of the basic block - I forget if we've considered/tried that before)

We have, but that without flushing the map on every instruction, so it caught materialization instructions that didn't actually belong to that IR instruction. The tactic would likely be more reasonable in conjunction with my patch.

(oh, when I was saying that I didn't really think - the materialization in this case wasn't necessarily on a BB boundary - so I guess my suggestion amounts to possibly never using line 0 (unless it's the only location in a whole BB), instead back or forward propagating surrounding locations over any line 0 - and that doesn't sound right when I put it that way)

But yeah, maybe some amount of it could be done around the flushing thing.

(FWIW, about this patch in general, I do worry a bit about this being a debug-info motivated optimization decision (even if that decision is applied uniformly/not just when debug info is enabled) - you mention this might have positive performance features due to smaller register live ranges, but also possibly negative ones rematerializing the same constant (" however, only about 5% of those values
were reused in an experimental self-build of clang.") - do you have performance measurements/benchmarks related to this change? I guess it didn't show up in the perf bot profiles upstream at least)

See D92606 for a front-end patch to improve locations in the IR.
That, plus reapplying this patch, should help out GDB. I haven't had a chance to run the suite myself with both patches applied and I'm basically off tomorrow, so if @dblaikie doesn't have a chance, I'll do that on Monday.

Note, lldb has a bunch of special handling for line 0 code. For instance, when we are pushing a breakpoint past the prologue we will keep pushing it forward over line number 0 lines. Those are compiler generated and in general people don't want to stop there. Similarly, if you are stepping through line 3 and the next line entry after 3 is line 0 we keep stepping till we get to a non-zero line.

When the compiler is actually using line 0 to mean "compiler generated code not really associated with a particular line, then I am pretty sure the debugger has to be aware of this or debugging is going to be a bit awkward...

I don't know if that's directly relevant to this bug, I haven't had time to follow the whole discussion. But I'm not convinced all the problems with line 0 emission causing debugging oddities can be solved in the line table generation.

Jim

See D92606 for a front-end patch to improve locations in the IR.
That, plus reapplying this patch, should help out GDB. I haven't had a chance to run the suite myself with both patches applied and I'm basically off tomorrow, so if @dblaikie doesn't have a chance, I'll do that on Monday.

Had a go - certainly creates a better debugging experience, but still fails the gdb test in question.

The code in the original test looks something like:

int multi_line_while_conditional (int a, int b, int c)
{
  while (a /* set breakpoint 4 here */
      && b 
      && c) 
    { 
      a--, b--, c--;
    }
  return 0;
}

And with these changes together, breaking on the function breaks on line 5 (presumably because this is creating the constant used by the second '&&' which is on line 5) instead of line 3. Not the worst thing, but I imagine given Sony's interest in less "jumpy" line tables, this might not be super desirable.

Yeah, the gdb experience is less than ideal:

13        multi_line_while_conditional(1, 1, 1);
(gdb) s
multi_line_while_conditional (a=1, b=1, c=1) at test.cpp:5
5             && c) 
(gdb) n
3         while (a /* set breakpoint 4 here */
(gdb) 
4             && b 
(gdb) 
5             && c) 
(gdb) 
4             && b 
(gdb) 
5             && c) 
(gdb) 
3         while (a /* set breakpoint 4 here */
(gdb) 
7             a--, b--, c--;

Compared with (without any of these patches):

13        multi_line_while_conditional(1, 1, 1);
(gdb) s
multi_line_while_conditional (a=1, b=1, c=1) at test.cpp:3
3         while (a /* set breakpoint 4 here */
(gdb) n
4             && b 
(gdb) 
5             && c) 
(gdb) 
3         while (a /* set breakpoint 4 here */
(gdb) 
7             a--, b--, c--;

And because I was curious about /exactly/ which tokens the debugger was stepping to, I split out the tokens onto their own lines:
With patch:

18        multi_line_while_conditional(1, 1, 1);
(gdb) s
multi_line_while_conditional (a=1, b=1, c=1) at test.cpp:8
8             && 
(gdb) n
5               a 
(gdb) 
6             && 
(gdb) 
8             && 
(gdb) 
7             b 
(gdb) 
8             && 
(gdb) 
9             c
(gdb) 
3         while 
(gdb) 
12            a--, b--, c--;

Without patch/with trunk:

18        multi_line_while_conditional(1, 1, 1);
(gdb) s
multi_line_while_conditional (a=1, b=1, c=1) at test.cpp:5
5               a 
(gdb) n
6             && 
(gdb) 
7             b 
(gdb) 
8             && 
(gdb) 
9             c
(gdb) 
3         while 
(gdb) 
12            a--, b--, c--;

Maybe it's OK? But at least it's 'interesting' enough that might deserve some extra scrutiny.

Also, FWIW, the other gdb test suite failures we saw were:

FAIL: gdb.base/foll-exec.exp: step through execlp call
FAIL: gdb.base/foll-exec.exp: step after execlp call
FAIL: gdb.base/foll-exec.exp: print execd-program/global_i (after execlp)
FAIL: gdb.base/foll-exec.exp: print execd-program/local_j (after execlp)
FAIL: gdb.base/foll-exec.exp: print follow-exec/local_k (after execlp)
FAIL: gdb.base/hbreak2.exp: hardware breakpoint at start of multi line while conditional 
FAIL: gdb.base/hbreak2.exp: hardware breakpoint info

Those last two are different versions of this original issue seen in break.exp/c with multi-line conditionals.

The foll-exec.exp ones I'm less sure of - Looks like maybe it's a similar line table attribution sort of thing, but it leads the test case not to step across the intended process boundary/exec call and things go weird from there.

Yeah, it boils down to something like this:

void f1(const char*, const char*) { }
int main() {
  char prog[1];

  f1(prog,
     prog);
}

Without the patch, you step to the 'f1' line, and then step into or over the function call (step or next).
But with these patches applied - you step to the 'f1(' line, then the 'prog);' line, then back to the 'f1(' line, then into/over the function call.

Again, could be acceptable - if those arguments had specific non-trivial code in them (like other function calls) you'd certainly get that step forward/backward behavior - but it's notewhorthy that this is change would make more cases like that & it'd be good to understand why/if that's a good thing overall.

And with these changes together, breaking on the function breaks on line 5 (presumably because this is creating the constant used by the second '&&' which is on line 5) instead of line 3. Not the worst thing, but I imagine given Sony's interest in less "jumpy" line tables, this might not be super desirable.

It's breaking on an xorl %eax,%eax which is produced by the PHI at the end of the conditional expression, which now has the source location of the operator at the top of the expression tree, which is the second && operator, yeah. Not the best. (FTR, the jumpiness complaints we get are usually more egregious, hopping between different source statements somewhat arbitrarily; not sure anyone would complain too loudly about hopping around within an expression. We see less of that in any case, because we suppress column info, but still.)

The real bear here is getting that constant to behave reasonably. In all cases the sequence of instructions is the same, but silly things keep happening to the line table. (As an aside, GCC does this entire thing with control-flow, not trying to compute and then test the bool result of the expression. They also handle a-- better. But I digress.)

Current HEAD puts the xor in the prologue (which also puts the while-loop top *before* prologue_end, that can't be right). Adding the PHI change doesn't affect this, because FastISel is still materializing the zero in the local value map, which (in HEAD) has no source locations.

.LBB0_1:                                # %while.cond
	xorl	%eax, %eax
.Ltmp0:
	.loc	1 3 10 prologue_end             # multi-line-while.c:3:10
	cmpl	$0, -4(%rbp)

With just my FastISel change (D91734), prologue_end is better, but the xor has line-0, which makes gdb sad (it decides the entire function has no source locations, which is wrong, but whatever):

.LBB0_1:                                # %while.cond
.Ltmp0:
	.loc	1 0 0 prologue_end              # multi-line-while.c:0:0
	xorl	%eax, %eax
	.loc	1 3 10                          # multi-line-while.c:3:10
	cmpl	$0, -4(%rbp)

With the PHI change from D92606, plus D91734, we see the PHI taking effect, which means we first break on line 5 and then hop backwards:

.LBB0_1:                                # %while.cond
.Ltmp0:
	.loc	1 5 10 prologue_end             # multi-line-while.c:5:10
	xorl	%eax, %eax
	.loc	1 3 10                          # multi-line-while.c:3:10
	cmpl	$0, -4(%rbp)

I think we'd prefer something more like this:

.LBB0_1:                                # %while.cond
.Ltmp0:
	.loc	1 3 10 prologue_end             # multi-line-while.c:3:10
	xorl	%eax, %eax
	cmpl	$0, -4(%rbp)

Getting that to happen means not making the PHI change, and persuading FastISel to do something more intelligent. Flushing on every instruction did seem to help, maybe if we made sure that first instruction also had the source location of the first "real" instruction? I will experiment with this tomorrow.

Yeah, it boils down to something like this:

void f1(const char*, const char*) { }
int main() {
  char prog[1];

  f1(prog,
     prog);
}

Without the patch, you step to the 'f1' line, and then step into or over the function call (step or next).
But with these patches applied - you step to the 'f1(' line, then the 'prog);' line, then back to the 'f1(' line, then into/over the function call.

Again, could be acceptable - if those arguments had specific non-trivial code in them (like other function calls) you'd certainly get that step forward/backward behavior - but it's notewhorthy that this is change would make more cases like that & it'd be good to understand why/if that's a good thing overall.

If you dump the IR you should see two GEP instructions, one for each actual parameter expression. Prior to the patch, FastISel found the second one computed the same value as the first one, and reused the value (in this case, the result of lea). This is one of the 5% of cases where a value is being reused across instructions, with the HEAD compiler. After the patch, we flush the local map after each GEP, so the second one gets its own separate lea (with its own source location). So, you're stepping to the first actual, then to the second, then to the call. Hope that answers the "why" question.

Whether it's a "good" thing... that path is fraught with peril. Looking at it somewhat abstractly (as I am sometimes wont to do), Clang is making choices to associate the source location of each parameter with the IR instructions to compute that parameter, and by flushing the map on every IR instruction, LLVM is more faithfully representing that in the final object. This is -O0 after all, where a more faithful representation is more likely to be more debuggable. gcc is making entirely different choices, lumping all parameter-prep instructions in with the function call itself. I'm hard pressed to say one is better than the other; they are different. Coke and Pepsi; people may have preferences, but that's not the same as being higher on a good-ness scale.

Yeah, it boils down to something like this:

void f1(const char*, const char*) { }
int main() {
  char prog[1];

  f1(prog,
     prog);
}

Without the patch, you step to the 'f1' line, and then step into or over the function call (step or next).
But with these patches applied - you step to the 'f1(' line, then the 'prog);' line, then back to the 'f1(' line, then into/over the function call.

Again, could be acceptable - if those arguments had specific non-trivial code in them (like other function calls) you'd certainly get that step forward/backward behavior - but it's notewhorthy that this is change would make more cases like that & it'd be good to understand why/if that's a good thing overall.

If you dump the IR you should see two GEP instructions, one for each actual parameter expression. Prior to the patch, FastISel found the second one computed the same value as the first one, and reused the value (in this case, the result of lea). This is one of the 5% of cases where a value is being reused across instructions, with the HEAD compiler. After the patch, we flush the local map after each GEP, so the second one gets its own separate lea (with its own source location). So, you're stepping to the first actual, then to the second, then to the call. Hope that answers the "why" question.

Whether it's a "good" thing... that path is fraught with peril. Looking at it somewhat abstractly (as I am sometimes wont to do), Clang is making choices to associate the source location of each parameter with the IR instructions to compute that parameter, and by flushing the map on every IR instruction, LLVM is more faithfully representing that in the final object. This is -O0 after all, where a more faithful representation is more likely to be more debuggable. gcc is making entirely different choices, lumping all parameter-prep instructions in with the function call itself. I'm hard pressed to say one is better than the other; they are different. Coke and Pepsi; people may have preferences, but that's not the same as being higher on a good-ness scale.

Fair enough - if you're satisfied these are reasonable cases of different-but-reasonable DWARF locations for the instructions, I'm OK with it.

If possible, would be good if you could run the gdb test suite and document the "regressions" (and possibly document test case changes you'd make to keep those tests passing) with this change - if you're liable to do that sooner or later anyway for your own test infrastructure. Otherwise I'll just handle it on our end.

probinson reopened this revision.Dec 10 2020, 8:27 AM

Reopening, will upload a new diff in a moment that modifies how PHIs are handled. I intend to run the gdb suite on this, will post findings when I have them.

This revision is now accepted and ready to land.Dec 10 2020, 8:27 AM
probinson updated this revision to Diff 310935.Dec 10 2020, 9:29 AM
probinson edited the summary of this revision. (Show Details)
probinson added a reviewer: dblaikie.

Rebase and combine cf1cc774d and dc35368c, plus revise handling constants materialized due to PHIs.
I plan to do some build-time and object size measurements, similar to the first time around.
I plan to run this against the gdb test suite, or at least the local copy we have here at Sony, and post my findings.

I'm remembering that in a number of the tests, I had to add improvements to the use of -LABEL so I could track down the issues. Those improvements should probably be factored out into a separate NFC commit as they're not strictly part of this change. I'll get to that later (but obviously before committing this).

probinson requested review of this revision.Dec 10 2020, 12:05 PM

This is still showing as approved; I'm going to try "Request Review" to see if that helps.

I've run this through our copy of the GDB suite (8.3, not sure how old that is). There are 10 differences, with and without the patch.

FAIL: gdb.base/foll-exec.exp: step through execlp call
FAIL: gdb.base/foll-exec.exp: step after execlp call
FAIL: gdb.base/foll-exec.exp: print execd-program/global_i (after execlp)
FAIL: gdb.base/foll-exec.exp: print execd-program/local_j (after execlp)
FAIL: gdb.base/foll-exec.exp: print follow-exec/local_k (after execlp)

These 5 are all adjacent test points; what happens is that a "next" ends up stepping through parameter evaluation of a call to execl(), where the test expects it to execute the call. Adding two more "next" commands should fix that.

FAIL: gdb.base/call-ar-st.exp: check args of sum_array_print
FAIL: gdb.base/funcargs.exp: continue to call2i
FAIL: gdb.base/funcargs.exp: backtrace from call7k (pattern 1)

These three are all cases where a function parameter list is long enough to require parameters to be passed on the stack. In that situation, prologue_end is being set *way* too early, and so breaking on the function will stop before the parameters have all been homed. Garbage ensues. Haven't gotten to the bottom of that yet.

The news is not all bad. These two changed to PASS:

UNTESTED: gdb.base/break-caller-line.exp: target arch has an instruction after call as part of the caller line
FAIL: gdb.python/python.exp: test find_pc_line with resume address
probinson updated this revision to Diff 314708.Jan 5 2021, 1:33 PM

Change how PHIs are handled; if the operand has a debug location, use it, otherwise don't set a debug location. Then, flushLocalValueMap() will look at the first local-value instruction; if it doesn't already have a debug location, we copy the location from the first instruction after the local-value instructions (which should have the debug location of the original IR instruction).

This version of the patch avoids the weirdness I was seeing with prolog instructions in certain cases.

The gdb test suite is largely happy with this; it avoids the new failures I mentioned previously, and one test needed to have some "next" commands updated. Here's the gdb test suite patch (against 8.3; I can provide more context if it's not clear how to apply to later versions).

diff --git a/gdb-8.3/gdb/testsuite/gdb.base/foll-exec.exp b/gdb-8.3/gdb/testsuite/gdb.base/foll-exec.exp
index 3e0653a1..61ee752f 100644
--- a/gdb-8.3/gdb/testsuite/gdb.base/foll-exec.exp
+++ b/gdb-8.3/gdb/testsuite/gdb.base/foll-exec.exp
@@ -115,7 +115,10 @@ proc do_exec_tests {} {
    # We should stop in execd-program, at its first statement.
    #
    set execd_line [gdb_get_line_number "after-exec" $srcfile2]
-   send_gdb "next\n"
+    # Clang will emit source locations for the parameter evaluation.
+    # Ideally we'd "next" until we saw 'execlp' again in the source display,
+    # then do one more.
+   send_gdb "next 3\n"
    gdb_expect {
      -re ".*xecuting new program: .*${testfile2}.*${srcfile2}:${execd_line}.*int  local_j = argc;.*$gdb_prompt $"\
                      {pass "step through execlp call"}
@@ -263,7 +266,7 @@ proc do_exec_tests {} {
    # the newly-exec'd program, not after the remaining step-count
    # reaches zero.
    #
-   send_gdb "next 2\n"
+   send_gdb "next 3\n"
    gdb_expect {
      -re ".*xecuting new program: .*${testfile2}.*${srcfile2}:${execd_line}.*int  local_j = argc;.*$gdb_prompt $"\
                      {pass "step through execl call"}

I think I might want to add one more lit test, because the LLVM suite didn't catch a thinko that was the root cause of that last prolog weirdness. But this version is totally ready for review.

dblaikie accepted this revision.Jan 5 2021, 4:21 PM

This version of the patch avoids the weirdness I was seeing with prolog instructions in certain cases.

The gdb test suite is largely happy with this; it avoids the new failures I mentioned previously, and one test needed to have some "next" commands updated. Here's the gdb test suite patch (against 8.3; I can provide more context if it's not clear how to apply to later versions).

diff --git a/gdb-8.3/gdb/testsuite/gdb.base/foll-exec.exp b/gdb-8.3/gdb/testsuite/gdb.base/foll-exec.exp
index 3e0653a1..61ee752f 100644
--- a/gdb-8.3/gdb/testsuite/gdb.base/foll-exec.exp
+++ b/gdb-8.3/gdb/testsuite/gdb.base/foll-exec.exp
@@ -115,7 +115,10 @@ proc do_exec_tests {} {
    # We should stop in execd-program, at its first statement.
    #
    set execd_line [gdb_get_line_number "after-exec" $srcfile2]
-   send_gdb "next\n"
+    # Clang will emit source locations for the parameter evaluation.
+    # Ideally we'd "next" until we saw 'execlp' again in the source display,
+    # then do one more.
+   send_gdb "next 3\n"
    gdb_expect {
      -re ".*xecuting new program: .*${testfile2}.*${srcfile2}:${execd_line}.*int  local_j = argc;.*$gdb_prompt $"\
                      {pass "step through execlp call"}
@@ -263,7 +266,7 @@ proc do_exec_tests {} {
    # the newly-exec'd program, not after the remaining step-count
    # reaches zero.
    #
-   send_gdb "next 2\n"
+   send_gdb "next 3\n"
    gdb_expect {
      -re ".*xecuting new program: .*${testfile2}.*${srcfile2}:${execd_line}.*int  local_j = argc;.*$gdb_prompt $"\
                      {pass "step through execl call"}

I think I might want to add one more lit test, because the LLVM suite didn't catch a thinko that was the root cause of that last prolog weirdness. But this version is totally ready for review.

Thanks - that seems to cover Google's internal gdb test execution as well. Appreciate the test patch!

I haven't looked closely, but I take it this one test modification seems reasonable to you? I guess we're stepping into some subexpressions on a function call that we previously didn't? (they didn't have any instructions attributed to them until now, but it's not unreasonable that they have them attributed - for materializing constants to pass to a function call when the function call/constants are split over multiple lines, etc)

This revision is now accepted and ready to land.Jan 5 2021, 4:21 PM

I haven't looked closely, but I take it this one test modification seems reasonable to you? I guess we're stepping into some subexpressions on a function call that we previously didn't? (they didn't have any instructions attributed to them until now, but it's not unreasonable that they have them attributed - for materializing constants to pass to a function call when the function call/constants are split over multiple lines, etc)

Exactly. In each case, the test is trying to "next" over a function call; gcc attributes all parameter evaluation to the function name, while clang will attribute each parameter to its own location. And when the parameters span multiple source lines, the is_stmt heuristic kicks in, so we stop on each line with actual parameters.

This is not ideal IMO; I'd rather be more principled about (a) stop at every parameter, or (b) stop at no parameters. But without reworking how we do is_stmt, I think fiddling the test to do enough single-steps to actually get past the function call is okay.

I haven't looked closely, but I take it this one test modification seems reasonable to you? I guess we're stepping into some subexpressions on a function call that we previously didn't? (they didn't have any instructions attributed to them until now, but it's not unreasonable that they have them attributed - for materializing constants to pass to a function call when the function call/constants are split over multiple lines, etc)

Exactly. In each case, the test is trying to "next" over a function call; gcc attributes all parameter evaluation to the function name, while clang will attribute each parameter to its own location. And when the parameters span multiple source lines, the is_stmt heuristic kicks in, so we stop on each line with actual parameters.

This is not ideal IMO; I'd rather be more principled about (a) stop at every parameter, or (b) stop at no parameters. But without reworking how we do is_stmt, I think fiddling the test to do enough single-steps to actually get past the function call is okay.

I get that it's a bit unreliable for the user - though GCC's approach isn't so different. If any of the parameter evaluations are themselves function calls, I think it does still multistep as well.

Actually, it seems it does not. Somewhere back around GCC 6 it did what we're doing, but since about 8 at least, no matter what the expressions (at least I tested for + and function call) it attributes all the instructions to the function call line - even the backtrace is misleading - I don't think it's better, just different.

But, yes, we could possibly do better with more strategic is_stmt, but that's a big/complex piece of work to tackle.

But, yes, we could possibly do better with more strategic is_stmt, but that's a big/complex piece of work to tackle.

Oh, absolutely. Didn't mean to imply otherwise. And the behavior now is at least easily explainable in terms of source line.

Thanks for the review, this has been an irritant for our licensees since forever. And especially thanks @rnk for the flush-on-every-instruction suggestion.

This revision was automatically updated to reflect the committed changes.