Page MenuHomePhabricator

[DebugInfo][FastISel] Prevent using debug location from previous block for local values
Needs RevisionPublic

Authored by sdmitrouk on May 20 2015, 10:38 AM.

Details

Summary

As FastISel groups local values at the top of each block or after call
instructions and doesn't attach any debug location to them (it's
intentionally cleared to prevent jumping back and forth in debuggers),
the following situation is possible:

BB#0:
    .loc 1 2 3
    call ...
    <constant load>
    .loc 1 5 6
    ...

BB#1:
    <constant load>
    .loc 1 3 4
    ...

Here <constant load> in both cases "accidentally" gets wrong debug location
from the preceding set of instructions, which is wrong.

This patch adds basic block post-processing for FastISel, which will look for
first instruction with non-null debug location and assign the location to first
local value (if one exists), producing the following:

BB#0:
    .loc 1 2 3
    call ...
    .loc 1 5 6
    <constant load>
    ...

BB#1:
    .loc 1 3 4
    <constant load>
    .loc 1 3 4
    ...

Diff Detail

Repository
rL LLVM

Event Timeline

sdmitrouk updated this revision to Diff 26161.May 20 2015, 10:38 AM
sdmitrouk retitled this revision from to [DebugInfo][FastISel] Prevent using debug location from previous block for local values.
sdmitrouk updated this object.
sdmitrouk edited the test plan for this revision. (Show Details)
sdmitrouk set the repository for this revision to rL LLVM.
sdmitrouk added a subscriber: Unknown Object (MLST).
asl added a subscriber: asl.May 20 2015, 12:47 PM
probinson edited edge metadata.May 20 2015, 1:01 PM

Does this handle the case where there is a call in the middle of the block? There is a special case for the local value map at that point, IIRC.
What happens when FastISel decides it can't finish selection for the block, and bails out to regular ISel?

Does this handle the case where there is a call in the middle of the block? There is a special case for the local value map at that point, IIRC.

Didn't try that, will do. This will need to account for calls of flushLocalValueMap, I thought it's used differently. Current version uses MBB->begin() which and probably won't work as expected. Thanks.

What happens when FastISel decides it can't finish selection for the block, and bails out to regular ISel?

Regular ISel already produces correct locations (hence this change uses same test as regular ISel). Not sure if there are some complicated cases which will lead to wrong results.

sdmitrouk updated this revision to Diff 26239.May 21 2015, 7:08 AM
sdmitrouk updated this object.
sdmitrouk edited edge metadata.

Thanks to Fred I see that this is basically the same as an old patch (http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120917/151433.html) related to the bug of the same age (https://llvm.org/bugs/show_bug.cgi?id=14501), which was unfortunately rejected... Still pulled in Pauls nice test case and added missing changes here. I would prefer to have some kind of fix rather than have obviously wrong locations (provided that we don't make anything worse that it is at the moment).

dblaikie edited edge metadata.Jun 17 2015, 8:50 AM

Some comments - and it'd still be good to have Alexey take a look and see if this is likely to address some of the issues in line table fidelity he knows about.

I know it won't address them all - especially when inlining within a single basic block causes the same fallthrough-like line table behavior, but at least across basic block this should help tidy things up & avoid any dependence on block ordering.

lib/CodeGen/SelectionDAG/FastISel.cpp
136

I don't see anything EH related in your test cases - is this exercised/verified? (perhaps without this existing tests regress? I'm not sure)

142

Could use std::find_if here, if you like.

test/DebugInfo/X86/dbg-line-fast-isel.ll
11

Do the three blocks in this function all test interesting parts of the patch? Or could we use two? If they're all interesting it might be worth a comment to explain how/why.

Do these test functions need non-void returns, or could they be simplified to return void?

(does the callme function need a non-void return too? Does it need a parameter? (I assume it needs the parameter for some reason))

Some comments

Thanks, they are addressed in the next revision.

it'd still be good to have Alexey take a look and see if this is likely to address some of the issues in line table fidelity he knows about.

I'll subscribe him, maybe he'll find time to take a look.

lib/CodeGen/SelectionDAG/FastISel.cpp
136

I tried to repeat search of insert position performed by FastISel::recomputeInsertPt(). Now tried to get misplaced debug locations without this change and couldn't, so removed it.

142

Right, that's better.

test/DebugInfo/X86/dbg-line-fast-isel.ll
11

Do the three blocks in this function all test interesting parts of the patch? Or could we use two? If they're all interesting it might be worth a comment to explain how/why.

Not really, the first call isn't actually checked for anything, that check is in the second function.

Do these test functions need non-void returns, or could they be simplified to return void?

The first one does, because there is a check that 0 from return 0 has correct debug location.

does the callme function need a non-void return too?

It's not necessary, will be changed.

Does it need a parameter? (I assume it needs the parameter for some reason)

Yes, it's the parameter debug location we're worried about here, as it's a "local value" and code was changed to update location for local values.

sdmitrouk updated this revision to Diff 28017.Jun 19 2015, 5:23 AM
sdmitrouk edited edge metadata.

Addressing David's comments.

Hey Alexey, this patch might be of interest for you with regard to issues you face in code coverage tools and might fix some problems with debug locations there. Please take a look if you can.

Right. I'm happy with the direction of this patch, and think that removing some of cross-basic-block fallthrough is a positive behavior change. This is orthogonal to the work I plan to do in https://llvm.org/bugs/show_bug.cgi?id=23837, but your change may improve things for end users. I can't comment on the patch implementation, as I know too little of SelectionDAG internals, but please proceed once the reviewers are happy with this change :)

I was hoping to punt this to Alexey, but that seems to have failed (or succeeded only partially) & we perhaps need a FastISel person to take a look...

Yes, I think so... I'm happy with the patch, but would prefer someone with more knowledge of ISel to sign this off.

Ping.
@mcrosier maybe you could take a look at these changes since you're listed as Fast-ISel code owner?

mcrosier resigned from this revision.Sep 1 2015, 8:37 AM
mcrosier removed a reviewer: mcrosier.
mcrosier added a subscriber: mcrosier.

We might need to, a bit more explicitly, rope someone in.

Anyone know who we should talk to about ISel stuff to get their side of the
review on this patch? (Chad, Eric - could you guys point us in any
particular direction/throw someone under the bus?)

(switching to the right/new llvm mailing list)

David, I asked Chad two weeks ago about review or someone who could make a review, that's why he added @ributzka, but if Juergen doesn't have time, getting attention of another person competent in this area would be nice.

echristo edited edge metadata.Sep 16 2015, 4:47 PM

I apologize for the delay on this patch, I'm taking a look at it. I think I have some comments so I'm going to take a bit extra time and look at a couple of other things.

Thanks!

-eric

Ping. Eric, any comments on this patch?

Hi, I apologize for the delay on this.

So, this is a hack on top of a hack on top of an optimization :)

Basically the "right" thing we should do is emit location information on constant loads. However, that was removed a while back IIRC by (or because of) Paul as stepping through code would jump around quite a bit because of the optimization where we only want to load a particular constant once and so we load it at the top of the block with the last location used.

I think the best thing for this would be either to remove the hack where we don't put locations on constant loads (which will lead us back to jumping around in blocks) or we can come up with a better optimization to handle "we actually want the constant load as close to the use as possible without multiple constant loads".

Thoughts?

-eric

that was removed a while back IIRC by (or because of) Paul

"It was that way when I found it."

Loading constants at the top of the (sub-)block, with no attached location, is how things stood when I took my first crack at it. Having the first instruction in a block inherit a source location from the tail of the physically preceding block is obviously not good. Attaching the original source location to the constant load isn't really better, because of the jumping-around thing.

My original patch had problems, which is one reason it didn't go in (or went in and came back out, I don't recall). We've been carrying it locally and most recently Wolfgang Pieb has been tweaking it. I've cc'd him.

I guess my real fundamental question is: Why does FastISel work backwards from the end of the block anyhow? I'm pretty sure we've seen cases where a bunch of constants get loaded up front, then there's so much register pressure that you start seeing spills. Working forward from the beginning, constants would be loaded as they are needed, and can hang around for re-use later in the block without screwing up source location info. I think we'd see less spilling and for sure the source location info would be smoother.

dblaikie accepted this revision.Jan 14 2016, 2:35 PM
dblaikie edited edge metadata.

I think this is probably a good thing. Let's go for it - it's not too convoluted/going to weigh the codebase down much.

This revision is now accepted and ready to land.Jan 14 2016, 2:35 PM
dblaikie requested changes to this revision.Jan 14 2016, 2:43 PM
dblaikie edited edge metadata.

(un-accept, didn't mean to override some of the thread context & I missed it while reading back over this - so going back to a "discussion" state)

my basic thinking is: We know some instructions will never have locations on them (synthesized instructions for various reasons, etc) & if those instructions ever end up at the start of a block we'll probably get bad behavior (because they'll be assigned to whatever location is at the end of the block that happens to be laid out prior to this one) so we'll want this backpropagation regardless of any fixes we make to constants.

Yes, putting single-use constants at their use would be nice/good/better (especially for things like asan), maybe even putting multi-use constants at their first use (is that always possible, though? do we ever propagate a constant load from two basic blocks into a common prior block for example?) if possible.

This revision now requires changes to proceed.Jan 14 2016, 2:43 PM

(un-accept, didn't mean to override some of the thread context & I missed it while reading back over this - so going back to a "discussion" state)

my basic thinking is: We know some instructions will never have locations on them (synthesized instructions for various reasons, etc) & if those instructions ever end up at the start of a block we'll probably get bad behavior (because they'll be assigned to whatever location is at the end of the block that happens to be laid out prior to this one) so we'll want this backpropagation regardless of any fixes we make to constants.

Yes, putting single-use constants at their use would be nice/good/better (especially for things like asan), maybe even putting multi-use constants at their first use (is that always possible, though? do we ever propagate a constant load from two basic blocks into a common prior block for example?) if possible.

This is a pretty compelling argument, another thought is can we just assume line/column 0 on anything that doesn't have a location and actually use that in the output line table? Will gdb/lldb/other debuggers skip over things (this will help in the "what if we have a block with no instructions with locations on it" issue here) or will they do something silly?

aprantl edited edge metadata.Jan 14 2016, 5:05 PM
aprantl added a subscriber: friss.

This is a pretty compelling argument, another thought is can we just assume line/column 0 on anything that doesn't have a location and actually use that in the output line table? Will gdb/lldb/other debuggers skip over things (this will help in the "what if we have a block with no instructions with locations on it" issue here) or will they do something silly?

I can only speak for LLDB, but LLDB will correctly skip over any line 0 locations.

(un-accept, didn't mean to override some of the thread context & I missed it while reading back over this - so going back to a "discussion" state)

my basic thinking is: We know some instructions will never have locations on them (synthesized instructions for various reasons, etc) & if those instructions ever end up at the start of a block we'll probably get bad behavior (because they'll be assigned to whatever location is at the end of the block that happens to be laid out prior to this one) so we'll want this backpropagation regardless of any fixes we make to constants.

Yes, putting single-use constants at their use would be nice/good/better (especially for things like asan), maybe even putting multi-use constants at their first use (is that always possible, though? do we ever propagate a constant load from two basic blocks into a common prior block for example?) if possible.

FastISel is never going to hoist anything across basic-block boundaries. It never looks at more than one block at a time. FastISel is used only at -O0; would there ever be a post-ISel pass at -O0 that tried to do that kind of hoisting?

Conceptually, FastISel probably could do enough bookkeeping to be able to push a constant-materialization _down_ to the first use, which might be worth it just to save spills/restores. If the instructions did get pushed down to the first use, they could then be given the same debug loc as the use, which at least can't be any worse than things are now.

That would still potentially leave instructions at the top of the block that deal with phis, though, wouldn't it? which kind of inherently don't have debug locations, so shuffling the local-value instructions around doesn't fully solve the top-of-block problem.

(un-accept, didn't mean to override some of the thread context & I missed it while reading back over this - so going back to a "discussion" state)

my basic thinking is: We know some instructions will never have locations on them (synthesized instructions for various reasons, etc) & if those instructions ever end up at the start of a block we'll probably get bad behavior (because they'll be assigned to whatever location is at the end of the block that happens to be laid out prior to this one) so we'll want this backpropagation regardless of any fixes we make to constants.

Yes, putting single-use constants at their use would be nice/good/better (especially for things like asan), maybe even putting multi-use constants at their first use (is that always possible, though? do we ever propagate a constant load from two basic blocks into a common prior block for example?) if possible.

FastISel is never going to hoist anything across basic-block boundaries. It never looks at more than one block at a time. FastISel is used only at -O0; would there ever be a post-ISel pass at -O0 that tried to do that kind of hoisting?

Not necessarily the case. Imagine a use where isel time is important, but running a couple of optimizations after isn't quite as important.

Conceptually, FastISel probably could do enough bookkeeping to be able to push a constant-materialization _down_ to the first use, which might be worth it just to save spills/restores. If the instructions did get pushed down to the first use, they could then be given the same debug loc as the use, which at least can't be any worse than things are now.

That would still potentially leave instructions at the top of the block that deal with phis, though, wouldn't it? which kind of inherently don't have debug locations, so shuffling the local-value instructions around doesn't fully solve the top-of-block problem.

See the other solution that I mentioned :)

See the other solution that I mentioned :)

I did, and I've asked the debugger lead. Hopefully will have an answer tomorrow (he's in England).

Looking at the original example, it sounds like you're suggesting we would produce something like this:

BB#0:
    .loc 1 2 3
    call ...
    <constant load>
    .loc 1 5 6
    ...

BB#1:
    .loc 1 0 0
    <constant load>
    .loc 1 3 4
    ...

Am I understanding this correctly? It sure would be less bookkeeping.

I guess for single-stepping purposes, we want the debugger to just keep going until it found an instruction with a non-zero line number. For interspersed-source-display purposes, the line-zero instruction would want to be associated with .loc 1 3 4 rather than .loc 1 5 6.

See the other solution that I mentioned :)

I did, and I've asked the debugger lead. Hopefully will have an answer tomorrow (he's in England).

Currently, the SCE debugger ignores line-0 records, but we could retain them and arrange not to stop on those instructions.

Eric, can you find out whether gdb can deal with line 0 entries?

(un-accept, didn't mean to override some of the thread context & I missed it while reading back over this - so going back to a "discussion" state)

my basic thinking is: We know some instructions will never have locations on them (synthesized instructions for various reasons, etc) & if those instructions ever end up at the start of a block we'll probably get bad behavior (because they'll be assigned to whatever location is at the end of the block that happens to be laid out prior to this one) so we'll want this backpropagation regardless of any fixes we make to constants.

Yes, putting single-use constants at their use would be nice/good/better (especially for things like asan), maybe even putting multi-use constants at their first use (is that always possible, though? do we ever propagate a constant load from two basic blocks into a common prior block for example?) if possible.

Note that I played with this a while ago and it's not trivial to get right. The biggest issue is that the generation of a local constant might trigger the generation of other constants that it depends on. So you have to do book keeping not at the instruction level, but by groups. I stopped working on it because I thought the added complexity (and I suppose runtime cost) didn't fit into FastISel.

This is a pretty compelling argument, another thought is can we just assume line/column 0 on anything that doesn't have a location and actually use that in the output line table? Will gdb/lldb/other debuggers skip over things (this will help in the "what if we have a block with no instructions with locations on it" issue here) or will they do something silly?

While I really like the simplicity of that (if debuggers can cope), it means that the line table will have 0 entries at the beginning of most (I'm guessing the 'most' part here) basic blocks, which would make them much bigger (and thus slower to write out). If my fears are unfounded, then I think this 0 idea is by far the best compromise!

In D9887#328125, @friss wrote:

While I really like the simplicity of that (if debuggers can cope), it means that the line table will have 0 entries at the beginning of most (I'm guessing the 'most' part here) basic blocks, which would make them much bigger (and thus slower to write out). If my fears are unfounded, then I think this 0 idea is by far the best compromise!

If we say that we just have to set the line number, and not the file and column numbers, I think "much bigger" is not real. Setting line number to 0 typically would require a DW_LNS_advance_line which is 1 byte plus the SLEB128-encoded offset, i.e. the negative of the current line number. (We'll assume that we don't need to reset the current file or column.) Then it would take another one to put the line number back. That would generally be 4 or 6 bytes per basic-block-that-had-these-instructions, for source files up to ~8K lines.

Emitting column numbers probably costs about as much, if not more (generally 2 bytes each time the column number changes), with less clear benefit. Generating some actual data would be nice, of course, but I'm not inclined to worry about it.

friss added a comment.Jan 25 2016, 7:03 PM

I implemented the line 0 idea in http://reviews.llvm.org/D16569