This is an archive of the discontinued LLVM Phabricator instance.

[lldb/DWARF] Fix sizes of DW_OP_const[1248][us] and DW_OP_litN results
ClosedPublic

Authored by labath on Nov 5 2020, 4:50 AM.

Details

Summary

Dwarf says (Section 2.5.1.1. of DWARF v5) that these operations should
push "generic" (pointer-sized) values. This was not the case for
DW_OP_const operations (which were pushing values based on the size of
arguments), nor DW_OP_litN (which were always pushing 64-bit values).

The practical effect of this that were were unable to display the values
of variables if the size of the DW_OP_const opcode was smaller than the
value of the variable it was describing. This would happen because we
would store this (small) result into a buffer and then would not be able
to read sufficient data out of it (in Value::GetValueAsData). Gcc emits
debug info like this.

Other (more subtle) effects are also possible.

The same fix should be applied to DW_OP_const[us] (leb128 versions), but
I'm not doing that right now, because that would cause us to display
wrong (truncated) values of variables on 32-bit targets (pr48087).

Diff Detail

Event Timeline

labath created this revision.Nov 5 2020, 4:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 5 2020, 4:50 AM
labath requested review of this revision.Nov 5 2020, 4:50 AM
labath added inline comments.Nov 5 2020, 4:54 AM
lldb/source/Expression/DWARFExpression.cpp
945

This is probably not conforming either -- dwarf says the generic type is of "unspecified signedness", but I think it means a single global value -- not choosing signedness on a per-operand basis.

I've kept that for now, since that was the original behavior, though I didn't notice any issues with this bit removed.

lldb/unittests/Expression/DWARFExpressionTest.cpp
191–192

Had to rewrite these, as they were relying on the fact that DW_OP_const operations were not implicitly converting to the generic type.

dblaikie added inline comments.Nov 6 2020, 5:05 PM
lldb/source/Expression/DWARFExpression.cpp
944

Local/non-escaping lambdas may be simpler written with default ref capture - is GetAddressByteSize() expensive or would it be fine to call it every time this lambda is called? (& capturing opcodes by ref)

945

Fair, would be good to mark it as at least suspect/uncertain with a comment or such. (or if there's some inverse that would assert if this ever did become relevant, that assertion would be great - but I guess there probably isn't anything we can assert to check that this doesn't matter in practice)

Re: non-conforming: I think the presence of constu and consts encodings that build "generic type"d values probably means the generic type itself may have no particular notion of signedness, but the values that populate it do and I guess intend the value to be interpreted contextually or to carry the sign information? I think it's probably good to carry the sign information from the operands, at least. Maybe with some truncation/sign extension. It's probably pretty close to right...

1275

Why the change to emplace_back? Generally I'd advocate for push_back where possible, since it can't invoke explicit conversions which can require more scrutiny (eg: a vector of unique_ptrs can have a unique_ptr rvalue push_back'd, but can't have a raw pointer push_back'd (a raw pointer would have to be emplace_back'd) - so when I see push_back I can assume it's a "safe" operation)

1298–1305

Seems fair

labath updated this revision to Diff 303825.Nov 9 2020, 4:35 AM
labath marked an inline comment as done.
  • use push_back
  • use default & capture
  • use fix signedness (unsigned)
labath updated this revision to Diff 303842.Nov 9 2020, 5:44 AM

Go back to using the signedness of the operand. Implementing the generic type
correctly would be a (much) larger undertaking, so stick to the existing
semantics as much as possible.

lldb/source/Expression/DWARFExpression.cpp
944

GetAddressByteSize is not expensive. A default capture would work just fine.

945

The DWARF standard (particularly version 4, which does not speak about signedness at all), could definitely clearer, but I think that if the committee wanted to convey the notion that a value carries the sign information with it, they would have phrased it differently. To me, the current weasel wording seems to be intended to handle the fact that different architectures have different notions about the signedness of pointers. It this world the exact DW_OP operation used does not affect the signedness of the result, but only the fact whether sign- or zero-extension takes place while converting to the "generic" type.

This is interpretation seems to match gcc behavior:

  • gcc will happily use DW_OP_const1u to describe a (signed) int constant, even if that constant would fit into a DW_OP_const1s
  • it will also use DW_OP_const1s -1 as a shorter form of DW_OP_const4u 0xffffffff when describing (unsigned)-1.

In fact, these findings (and the fact that clang does not use the fixed-size operands at all) have convinced me to just fix the signedness of the generic type.

1275

I don't know. I wanted to be fancy? Normally, I am also against the overuse of emplace_back...

labath added inline comments.Nov 9 2020, 5:53 AM
lldb/source/Expression/DWARFExpression.cpp
945

After more experimentation I realized this is a rabbit hole. The signedness of the "generic type" is defined on a per-operand basis, but per-operation:

If the type of the operands is the generic type, except as otherwise specified, the
arithmetic operations perform addressing arithmetic, that is, unsigned arithmetic
that is performed modulo one plus the largest representable address.

The only operation which explicitly uses signed arithmetic is division, but that's like one of the few operations where this distinction matters. Interestingly the modulo operation does _not_ use signed arithmetic.

Even more interestingly, GCC seems to get all of these nuances right -- it writes a smod b by a - (a sdiv b)*b, and uses multiplicative inverses or DW_OP_(GNU_)convert to produce unsigned division. I don't know what llvm does, as I have not been able to coax it to produce these kinds of expressions, but it seems it has a lot of work to do, both on producer and consumer sides.

For now, I'm just going back to the use-signedness-of-operand implementation, as that is what we've done before.

JDevlieghere accepted this revision.Nov 9 2020, 2:09 PM

LGTM. It fixes the issue at hand and gets us closer to conforming to the standard.

lldb/source/Expression/DWARFExpression.cpp
948

Can't wait for C++20's templated lambdas ;-)

1275

TIL

This revision is now accepted and ready to land.Nov 9 2020, 2:09 PM
dblaikie accepted this revision.Nov 9 2020, 7:11 PM

Sounds good

This revision was landed with ongoing or failed builds.Nov 10 2020, 7:10 AM
This revision was automatically updated to reflect the committed changes.