This is an archive of the discontinued LLVM Phabricator instance.

Prevent the creation of empty location list ranges.
Needs ReviewPublic

Authored by friss on Dec 2 2014, 8:04 PM.

Details

Summary

We happen to create such ranges today depending on teh scheduling of the
DBG_VALUE instructions. An empty range is not only useless, it can also
be a real issue if it happens to span the 0-0 address range (address ranges
in loaction lists are function relative). 0-0 is defined as the end of
location list marker, thus if such a range is generated it will hide the
other entries of the list. A Dwarf consumer like llvm-dwarfdump that
tries to read the .debug_loc section linearly will get totally lost (this
way of reading .debug_loc could arguably be considered a llvm-dwarfdump
bug though).

Note that the patch as-is replaces a std::pair by a struct containing 'first'
and 'second' members. I did this to expose only the logic changes in this
patch. I can do proper renaming as a first step if we agree on the patch's
direction.

Note also that the patch doesn't contain a real test for the empty range
deletion. It might be tricky to come up with something, but I can try if
we agree that the patch DTRT. The testing part of this patch is the fixup
of 2 tests that happened to pass by mistake as the variables they were
testing referred to invalid empty location lists. Once these ranges gone,
the variable is gone too. I XFAILed one test that wouldn't test anything
if modified and I removed the failing parts of the other one (turns out it
was already failing on linux targets).

Diff Detail

Event Timeline

friss updated this revision to Diff 16849.Dec 2 2014, 8:04 PM
friss retitled this revision from to Prevent the creation of empty location list ranges..
friss updated this object.
friss added reviewers: echristo, dblaikie, aprantl, samsonov.
friss added a subscriber: Unknown Object (MLST).
dblaikie edited edge metadata.Dec 2 2014, 8:53 PM

Yeah, I'm mildly curious to get a better understanding of where this comes up & whether we should just be dropping the dbg.value intrinsics earlier rather than waiting until we find that they describe nothing of interest - but it seems totally plausible that that wouldn't be easy/clean/convenient to do and that this (what you've proposed) is the right place to do it.

lib/CodeGen/AsmPrinter/DbgValueHistoryCalculator.cpp
48

I don't follow what's going on here - maybe a comment is in order? (not sure why it's only pushing back if the back() is true). I guess whatever the answer is also explains why SeenInstructionInRange starts with size 1 rather than empty.

OK, I think I see what's going on here - rather than having entry in SeenInstructionInRange per live range, you avoid having them in the case where two ranges start together (or are not "valid" before teh second one starts, at least). Bit subtle, but I think I get it - an alternative idea/question:

How do you avoid having to walk all the elements of SeenInstructionInRange when performing validateOpenRanges? I would've imagined/expected that you'd need to set all the elements of the container to true, not just the last one, no?

In any case - could we just keep booleans directly in the InstrRanges - and validateOpenRanges would iterate through InstrRanges and set them all to true?

50

The pointer to the SmallVector element could end up dangling if the SmallVector is reallocated.

lib/CodeGen/AsmPrinter/DbgValueHistoryCalculator.h
39

Do we need one of these on every InstrRange? I suppose this comes up if we have multiple open InstrRanges at the same time? Do we support/implement that scenario? (I imagine we probably don't - in which case we only need "NonEmpty" for the currently open InstrRange - so perhaps InstrRanges should be a struct of SmallVector<InstrRange> + bool *NonEmpty, so we just have it once per InstrRanges, rather than once for every range within the InstrRanges)

48

'validate' seems a bit vague, but maybe sufficient

test/DebugInfo/X86/block-capture.ll
9 ↗(On Diff #16849)

Just because we don't generate any locations doesn't mean we should omit the variable entirely - maybe a comment explaining that the missing variable is a (rather severe) bug, even if it doesn't have any location (another, perhaps less severe, bug, most likely)

friss added a comment.Dec 2 2014, 10:24 PM
In D6497#4, @dblaikie wrote:

Yeah, I'm mildly curious to get a better understanding of where this comes up & whether we should just be dropping the dbg.value intrinsics earlier rather than waiting until we find that they describe nothing of interest - but it seems totally plausible that that wouldn't be easy/clean/convenient to do and that this (what you've proposed) is the right place to do it.

In my opinion this is the only place where we can safely do that, because it's the only place where we are sure that Instructions are at their final place in the asm flow.

lib/CodeGen/AsmPrinter/DbgValueHistoryCalculator.cpp
48

Basically each entry in SeenInstructionInRange represents the status of a series of DBG_VALUEs that start at the same PC. If the last entry is false, we can just reuse it, our DBG_VALUE starts at the same PC that all the current still-empty ranges. If the last entry is true, we need to add a new one that will get false till the nest real instruction.
Thus the container always contains only 'true' elements except maybe the last one that indicates that there are newly opened ranges that haven't seen an instruction yet.

And the fact that a range is opened first doesn't mean it'll get closed first. We need to keep the old 'true' values that are pointed to live.

It's a bit subtle, but it's the only way I could think of that keeps the pass algorithmic complexity the same. I fear that iterating the InstrRanges could lead to some really bad pathological cases.

If you want to store the flag directly in the InstrRange, you need a way to iterate only the non-validated ones to mark them non-empty. This isn't trivial as the primary storage of the InstrRange is in a SmallVector, thus their address isn't stable (I haven't investigated if there would be a drawback to storing them in a list instead).

I prefer the simplicity of just toggling the status for all the current entries by just modifying a bool value that everybody points to.

50

Bummer... I knew I had a good reason for having implemented that as a list first. And then I added the call to SeenInstructionInRange.resize() and thought that this would definitely be more efficient with a vector.

lib/CodeGen/AsmPrinter/DbgValueHistoryCalculator.h
39

We have multiple Ranges in the Same vector with different statuses. Today it's due to what I consider a bug: we never close ranges that aren't described by a register (i.e. a constant variable). But ultimately, even for registers it might make sense to have overlapping ranges for the same variable (the fact that a new DBG_VALUE describes the variable doesn't mean that the value mysteriously disappeared from the previous register).

So yes, I explicitly designed this solution to support that scenario.

48

In an earlier iteration the added field in InstrRange was called 'Valid'. I could rename that to markOpenRangesNonEmpty().

test/DebugInfo/X86/block-capture.ll
9 ↗(On Diff #16849)

I thought this was implicit due to the XFAIL, but I can make it even more visible.

aprantl added inline comments.Dec 3 2014, 9:23 AM
test/DebugInfo/X86/block-capture.ll
9 ↗(On Diff #16849)

I'll have a look at this and see how we can fix the test case.

test/DebugInfo/X86/dbg-value-inlined-parameter.ll
34

If we drop the DW_TAG_formal_parameter completely, the function signature doesn't match the subroutine type any more. I think the formal parameter should still be there, even if it has no location.

friss updated this revision to Diff 16905.Dec 3 2014, 6:30 PM
friss edited edge metadata.
  • Implement SeenInstructionInRange as a forward_list to guarantee element pointer stability.
  • Add more in depth comment.
  • Rename validateOpenRanges to markOpenRangesNonEmpty.
aprantl edited edge metadata.Dec 5 2014, 10:24 AM

I regenerated block-capture.ll from source in r223492.

dblaikie added inline comments.Dec 8 2014, 9:45 AM
lib/CodeGen/AsmPrinter/DbgValueHistoryCalculator.cpp
49

So 'SeenInstructionInRange' will just keep growing as the function grows? Should we try to limit it in any way.

Perhaps an alternative strategy would be to have a "currently unseen" std::shared_ptr<bool>, and then in "markOpenRangesNonEmpty" it would be:

if (CurrentlySeen)
  *CurrentlySeen = true;
CurrentlySeen = llvm::make_unique<bool>(false);

and then in "endInstrRange" the std::shared_ptr<bool> would be:

if (*Ranges.back().NonEmpty) {
  Ranges.back().second = &MI;
  Ranges.back().NonEmpty = nullptr;
} ...

(would this make it any easier to move the NonEmpty std::shared_ptr<bool> to the InstrRanges rather than into its elements (so we just have one per range set, rather than one per range)? I forget)

In any case, hoping Alexey can weigh in, as the person who (re)wrote most of this code most recently.

friss added inline comments.Dec 8 2014, 12:22 PM
lib/CodeGen/AsmPrinter/DbgValueHistoryCalculator.cpp
49

Yes, SeenInstructionInRange only grows with the function. It will however never contain more items than there are ranges, thus the space complexity of the DbgValueHistoryCalculator stays the same.

Your suggestion would work, but I'm not sure it is easier to grok for the reader :-) Putting the status in the range set rather than in the range itself would require tracking how many open ranges are empty for this set, which shouldn't be an issue.

Piggy backing on your shared_ptr idea, here's another one requiring at most one shared_ptr<bool> at a time and with lifetime semantics a bit simpler than in your proposal: make NonEmpty a weak_ptr (and call it Empty). The value of the pointed bool doesn't matter, but the weak_ptr validity is the real information.

markOpenRanges becomes:

if (SeenInstructionInRange.use_count())  // Are there empty open ranges?
  SeenInstructionInRange = make_shared<bool>(true); // This invalidates all current weak_ptrs

and in endInstrRange:

if (!Ranges.back().Empty.lock())
  Ranges.back().second = &MI;
else
  Ranges.pop_back();

There is some hidden complexity in the shared_ptr handling though (management of the shared_ptr use lists), that make me believe that the proposed solution is still more efficient. I don't think this is a big deal because these lists should be short, but I wouldn't be surprised if some pathological case shows up one day.

(and yes, Alexey's input would be much welcome)

samsonov edited edge metadata.Dec 9 2014, 3:30 PM

Finally looked at this patch, sorry for the delay.

First of all, I dislike the extension of InstrRange structure, and adding the extra list to DbgValueHistoryMap. After we returned from calculateDbgValueHistory(), we don't need and never use InstrRange::NonEmpty pointers, and the list, so it's not nice we leave this extra data lying around.

The comment about InstrRange explicitly tells that instruction range *may not* be terminated - it means the range is assumed to be valid either until the start of the next range, or until the end of function. However, you pop_back() "empty" instruction ranges only in endInstrRange() function. It means you may still end up with empty ranges if they were never closed.

Can we do the following instead: have another member:

std::map<const MDNode *, InstrRange> emptyRanges;

which would describe the current, opened and empty, range for a given variable. It means that smth. like

const InstrRange& getCurrentRangeForVar(const MDNode *Var);

would return either the entry from emptyRanges, or the last value of VarInstrRanges vector.

Then startInstrRange would simply add new range, and markOpenRangesNonEmpty() would move the contents of emptyRanges to VarInstrRanges. At the end of calculateDbgValueHistory, we can just discard the contents of emptyRanges.

lib/CodeGen/AsmPrinter/DbgValueHistoryCalculator.cpp
232

Hm, this code is run after the register allocator, can you use isTransient() here? If no, maybe we can introduce MachineInstr::isPseudoInstruction() and use it in MachineInstr::isTransient() instead?

lib/CodeGen/AsmPrinter/DbgValueHistoryCalculator.h
57

Note that this can leave your SeenInstructionInRange list have a single "true" value. This would probably still work, though.

In D6497#12, @samsonov wrote:

Finally looked at this patch, sorry for the delay.

First of all, I dislike the extension of InstrRange structure, and adding the extra list to DbgValueHistoryMap. After we returned from calculateDbgValueHistory(), we don't need and never use InstrRange::NonEmpty pointers, and the list, so it's not nice we leave this extra data lying around.

I can understand that and especially the liveness of the NonEmpty field bothers me. However, let me restate my goal: I wanted to find a way to keep the pass at the same algorithmic complexity. DbgValueHistoryCalculator has already shown up on profiles and I wanted to avoid introducing new potentially costly bookkeeping logic. That being said, I might be too careful. (And I even failed at that, because the destructor of the list<> will do a O(<number of Dbg_VALUE>) walk anyway).

The comment about InstrRange explicitly tells that instruction range *may not* be terminated - it means the range is assumed to be valid either until the start of the next range,

(not really relating to that actual patch: I do not see why the beginning of a variable range should close the previous one for that variable. A variable might be present in 2 registers at some point it is even explicitly stated in the Dwarf standard. I even think you can't get the end result right if you try to prevent this from happening.)

or until the end of function. However, you pop_back() "empty" instruction ranges only in endInstrRange() function. It means you may still end up with empty ranges if they were never closed.

This is true, however the empty ranges could only appear at the very end of the function which is less of an issue. For an empty range to happen there, something would need to have generated a DBG_VALUE after the last instruction of the function (Is that even possible? isn't there a terminator requirement like at the higher level?). But yes, this pass runs so late (meaning the DBG_VALUEs could have been mishandled by so many things) that we need to take care of that case if we want to be exhaustive.

Can we do the following instead: have another member:

std::map<const MDNode *, InstrRange> emptyRanges;

which would describe the current, opened and empty, range for a given variable.

You can have multiple open ranges for a variable (range entries describing constants will never be closed for example). I do not like the limitation to only one pending range. Thus we'd need to make the map value type a list.

It means that smth. like

const InstrRange& getCurrentRangeForVar(const MDNode *Var);

would return either the entry from emptyRanges, or the last value of VarInstrRanges vector.
Then startInstrRange would simply add new range, and markOpenRangesNonEmpty() would move the contents of emptyRanges to VarInstrRanges. At the end of calculateDbgValueHistory, we can just discard the contents of emptyRanges.

I'll try to use a simpler approach with a map<MDnode *, list<InstrRange>> if everybody agrees that the additional bookkeeping shouldn't be an issue.

lib/CodeGen/AsmPrinter/DbgValueHistoryCalculator.cpp
232

I reused the way AsmPrinter::EmitFunctionBody() counts the real emitted instructions. isTransient() seems to DTRT, I could use that.

lib/CodeGen/AsmPrinter/DbgValueHistoryCalculator.h
57

I considered adding a comment for that pointing out that it doesn't matter. The important thing is that the list shouldn't be empty. Now that I look at it again, even though the comment wouldn't hurt, it would actually be longer than reseting the front value to false :-)

In D6497#100223, @friss wrote:
In D6497#12, @samsonov wrote:

Finally looked at this patch, sorry for the delay.

First of all, I dislike the extension of InstrRange structure, and adding the extra list to DbgValueHistoryMap. After we returned from calculateDbgValueHistory(), we don't need and never use InstrRange::NonEmpty pointers, and the list, so it's not nice we leave this extra data lying around.

I can understand that and especially the liveness of the NonEmpty field bothers me. However, let me restate my goal: I wanted to find a way to keep the pass at the same algorithmic complexity. DbgValueHistoryCalculator has already shown up on profiles and I wanted to avoid introducing new potentially costly bookkeeping logic. That being said, I might be too careful. (And I even failed at that, because the destructor of the list<> will do a O(<number of Dbg_VALUE>) walk anyway).

The comment about InstrRange explicitly tells that instruction range *may not* be terminated - it means the range is assumed to be valid either until the start of the next range,

(not really relating to that actual patch: I do not see why the beginning of a variable range should close the previous one for that variable. A variable might be present in 2 registers at some point it is even explicitly stated in the Dwarf standard. I even think you can't get the end result right if you try to prevent this from happening.)

I don't say that ranges should be non-overlapping - they shouldn't in general case. I was just telling that apparently it was the case when this code was refactored, and this assumption seems to be baked in there. For instance, DbgValueHistoryMap::getRegisterForVar() assumes returns a single value, apparently assuming that a variable can be stored in a single register. We'd need to carefully audit the code, see how it works with new DwarfDebug::buildLocationList method, etc. I'm sort of surprised that the latter works with the current
structure of DbgValueHistoryMap::InstrRanges.

or until the end of function. However, you pop_back() "empty" instruction ranges only in endInstrRange() function. It means you may still end up with empty ranges if they were never closed.

This is true, however the empty ranges could only appear at the very end of the function which is less of an issue. For an empty range to happen there, something would need to have generated a DBG_VALUE after the last instruction of the function (Is that even possible? isn't there a terminator requirement like at the higher level?). But yes, this pass runs so late (meaning the DBG_VALUEs could have been mishandled by so many things) that we need to take care of that case if we want to be exhaustive.

Can we do the following instead: have another member:

std::map<const MDNode *, InstrRange> emptyRanges;

which would describe the current, opened and empty, range for a given variable.

You can have multiple open ranges for a variable (range entries describing constants will never be closed for example). I do not like the limitation to only one pending range. Thus we'd need to make the map value type a list.

It means that smth. like

const InstrRange& getCurrentRangeForVar(const MDNode *Var);

would return either the entry from emptyRanges, or the last value of VarInstrRanges vector.
Then startInstrRange would simply add new range, and markOpenRangesNonEmpty() would move the contents of emptyRanges to VarInstrRanges. At the end of calculateDbgValueHistory, we can just discard the contents of emptyRanges.

I'll try to use a simpler approach with a map<MDnode *, list<InstrRange>> if everybody agrees that the additional bookkeeping shouldn't be an issue.

friss added a comment.Dec 11 2014, 3:54 PM
In D6497#100223, @friss wrote:
In D6497#12, @samsonov wrote:

Finally looked at this patch, sorry for the delay.

First of all, I dislike the extension of InstrRange structure, and adding the extra list to DbgValueHistoryMap. After we returned from calculateDbgValueHistory(), we don't need and never use InstrRange::NonEmpty pointers, and the list, so it's not nice we leave this extra data lying around.

I can understand that and especially the liveness of the NonEmpty field bothers me. However, let me restate my goal: I wanted to find a way to keep the pass at the same algorithmic complexity. DbgValueHistoryCalculator has already shown up on profiles and I wanted to avoid introducing new potentially costly bookkeeping logic. That being said, I might be too careful. (And I even failed at that, because the destructor of the list<> will do a O(<number of Dbg_VALUE>) walk anyway).

The comment about InstrRange explicitly tells that instruction range *may not* be terminated - it means the range is assumed to be valid either until the start of the next range,

(not really relating to that actual patch: I do not see why the beginning of a variable range should close the previous one for that variable. A variable might be present in 2 registers at some point it is even explicitly stated in the Dwarf standard. I even think you can't get the end result right if you try to prevent this from happening.)

I don't say that ranges should be non-overlapping - they shouldn't in general case. I was just telling that apparently it was the case when this code was refactored, and this assumption seems to be baked in there. For instance, DbgValueHistoryMap::getRegisterForVar() assumes returns a single value, apparently assuming that a variable can be stored in a single register. We'd need to carefully audit the code, see how it works with new DwarfDebug::buildLocationList method, etc. I'm sort of surprised that the latter works with the current structure of DbgValueHistoryMap::InstrRanges.

Yeah, my comment was more about the state of the code than about your reply. I'd really like to find time to really revamp that part of the debug info flow. The fact is that this is usually only used for optimized code and nobody really cares today about the quality of the debug information in that case. It's also used for -O0 ASAN instrumented code, but it's still something that very few people (if any) will notice if it gives wrong results.

Fred

or until the end of function. However, you pop_back() "empty" instruction ranges only in endInstrRange() function. It means you may still end up with empty ranges if they were never closed.

This is true, however the empty ranges could only appear at the very end of the function which is less of an issue. For an empty range to happen there, something would need to have generated a DBG_VALUE after the last instruction of the function (Is that even possible? isn't there a terminator requirement like at the higher level?). But yes, this pass runs so late (meaning the DBG_VALUEs could have been mishandled by so many things) that we need to take care of that case if we want to be exhaustive.

Can we do the following instead: have another member:

std::map<const MDNode *, InstrRange> emptyRanges;

which would describe the current, opened and empty, range for a given variable.

You can have multiple open ranges for a variable (range entries describing constants will never be closed for example). I do not like the limitation to only one pending range. Thus we'd need to make the map value type a list.

It means that smth. like

const InstrRange& getCurrentRangeForVar(const MDNode *Var);

would return either the entry from emptyRanges, or the last value of VarInstrRanges vector.
Then startInstrRange would simply add new range, and markOpenRangesNonEmpty() would move the contents of emptyRanges to VarInstrRanges. At the end of calculateDbgValueHistory, we can just discard the contents of emptyRanges.

I'll try to use a simpler approach with a map<MDnode *, list<InstrRange>> if everybody agrees that the additional bookkeeping shouldn't be an issue.

friss updated this revision to Diff 17350.Dec 16 2014, 11:50 AM
friss edited edge metadata.
  • Reimplement in a more straightforward way.

So this commit doesn't pass the testsuite as a new test added last week by
Adrian would generate an empty range and thus be cleaned up by this patch.
I'm still posting it for reference. Another issue I noticed: the patch handles
locations the same way as the current logic does, using always the latest
started range. This makes no real sense as the instruction that makes us
close the range might be cloberring the register of any of the currently
opened ranges for the variables.

If you think it is still a good idea to apply something like that, just tell
me. Otherwise, I'm going to try to find some time to finally rewrite the pass
more in depth, because each time I touch it I realize a bit more how broken
it is.