This is an archive of the discontinued LLVM Phabricator instance.

[DWARFYAML] Implement the .debug_loclists section.
ClosedPublic

Authored by Higuoxing on Jul 21 2020, 5:31 AM.

Details

Summary

This patch implements the .debug_loclists section. There are only two
DWARF expressions are implemented in this patch (DW_OP_consts,
DW_OP_stack_value). We will implement more in the future.

The YAML description of the .debug_loclists section is:

debug_loclists:
  - Format:              DWARF32 ## Optional
    Length:              0x1234  ## Optional
    Version:             5       ## Optional (5 by default)
    AddressSize:         8       ## Optional
    SegmentSelectorSize: 0       ## Optional (0 by default)
    OffsetEntryCount:    1       ## Optional
    Offsets:             [ 1 ]   ## Optional
    Lists:
      - Entries:
          - Operator:          DW_LLE_startx_endx
            Values:            [ 0x1234, 0x4321 ]
            DescriptorsLength: 0x1234             ## Optional
            Descriptors:
              - Operator: DW_OP_consts
                Values:   [ 0x1234 ]

Diff Detail

Event Timeline

Higuoxing created this revision.Jul 21 2020, 5:31 AM
Higuoxing marked an inline comment as done.Jul 21 2020, 5:36 AM
Higuoxing added inline comments.
llvm/include/llvm/ObjectYAML/DWARFYAML.h
190

There are some operators that take a size and a block of that size. I'm not sure if it's ok to use a std::vector<> here.

Higuoxing edited the summary of this revision. (Show Details)Jul 21 2020, 5:40 AM
Higuoxing planned changes to this revision.Jul 21 2020, 6:53 AM

I am going to pull out the refactor work to a separate patch.

Higuoxing updated this revision to Diff 279515.Jul 21 2020, 7:22 AM

Pull out refactor works.

Higuoxing marked an inline comment as done.Jul 22 2020, 12:35 AM
Higuoxing added inline comments.
llvm/include/llvm/ObjectYAML/DWARFYAML.h
190

I think we can add a new field called BlockSize which is used to overwrite the size of blocks.

struct DWARFOperation {
  dwarf::LocationAtom Operator;
  Optional<yaml::Hex64> BlockSize;
  std::vector<yaml::Hex64> Values;
};

e.g., DW_OP_implicit_value takes two operands, the first operand is an ULEB128 size, the second operand is a block of that size. We can use std::vector<yaml::Hex64> to hold a series of bytes, and the size of this block can be overwritten by BlockSize.

I don't see anything out of the ordinary here, but I'll leave it others to have the final say.

llvm/include/llvm/ObjectYAML/DWARFYAML.h
190

Supporting DW_OP_entry_value is going to be amusing, as that takes an entire nested dwarf expression as an operand. But I think it would be fine even if the nested expression is simply printed in hex (in fact, I would be fine if initially _all_ dwarf expressions are simply printed in hex).

llvm/test/tools/yaml2obj/ELF/DWARF/debug-loclists.yaml
23–93

Would it be better to verify this structurally by checking llvm-dwarfdump output ?

Higuoxing marked an inline comment as done.Jul 22 2020, 1:49 AM
Higuoxing added inline comments.
llvm/test/tools/yaml2obj/ELF/DWARF/debug-loclists.yaml
23–93

Hi @labath, I've tested the generated sections using llvm-dwarfdump on my local machine. We usually check the hex dump of sections to "bootstrap" yaml2obj. You can take a look at this thread :-) https://reviews.llvm.org/D82367#2134629

labath marked an inline comment as done.Jul 22 2020, 1:58 AM
labath added inline comments.
llvm/test/tools/yaml2obj/ELF/DWARF/debug-loclists.yaml
23–93

I see. That makes sense. Thanks for the pointer.

Higuoxing marked an inline comment as done.Jul 22 2020, 2:37 AM
Higuoxing added inline comments.
llvm/test/tools/yaml2obj/ELF/DWARF/debug-loclists.yaml
23–93

I'm wondering if it's helpful to put the output of llvm-dwarfdump in comments. e.g.,

## $ llvm-dwarfdump --debug-loclists -v %t.o
## .debug_loclists contents:
## 0x00000000: locations list header: length = 0x00000023, version = 0x0005, addr_size = 0x08, seg_size = 0x00, offset_entry_count = 0x00000002
## offsets: [
## 0x00000010 => 0x00000024
## 0x0000001a => 0x0000002e
## ]
## 0x00000024: 
##             DW_LLE_startx_endx     (0x0000000000001234, 0x0000000000004321): DW_OP_consts +4660
##             DW_LLE_end_of_list     ()
jhenderson added inline comments.Jul 22 2020, 2:43 AM
llvm/include/llvm/ObjectYAML/DWARFYAML.h
190

I guess longer-term, we could add another field to a DWARFOperation, that is itself an Optional<DWARFOperation>, to allow for a kind of linked-list approach to the expression. In the YAML, it might look like:

- Entries:
    - Operator:          DW_LLE_startx_endx
      Values:            [ 0x1234, 0x4321 ]
      DescriptorsLength: 0x4321
      Descriptors:
        - Operator: DW_OP_entry_value
          Expression:
            - Operator: DW_OP_entry_value
              Expression:
                - Operator: DW_OP_entry_value
                ...

An operation would only be allowed to have one of Values or Expression in that case. But I think you're right, there's no need to do that now. The behaviour in this patch would be enough for simple cases.

@Higuoxing, I think your idea with an optional BlockSize makes sense to me.

201–202

I think "Descriptions" is the canonical term in the spec, so we should probably stick with that, i.e. use DescriptionsLength and Descriptions.

llvm/lib/ObjectYAML/DWARFEmitter.cpp
445–447

One of the things yaml2obj is useful for is for creating broken ELFs. I see you're adding further checks here to make sure the Values array is the right size for the type. But what if you want to have an incorrect number of values for a given operation?

There are similar issues elsewhere, I'm sure, so I'm not saying this specific patch needs changing at this point, but I do think you need to consider how to solve this case. For example, it might mean being more specific with the type of value by using distinct fields for different types. For example, instead of Values: [0x1234, 0x4321] you might want something like:

Values:
  - Type: SLEB128
    Value: 0x1234
  - Type: Address
    Value: 0x4321

or even:

Values:
  - SLEB128: 0x1234
  - Address: 0x4321

What do you think?

566

These lambdas are very similar to those elsewhere. Maybe you should pull them out into a function.

620

Similar comments apply here to my earlier thoughts regarding checking operands.

llvm/test/tools/yaml2obj/ELF/DWARF/debug-loclists.yaml
2

I'll look at the testing more tomorrow.

23–93

I don't have a strong opinion on this either way. One of the problems is that this will rot easily if you make changes/expand the test, or if llvm-dwarfdump changes output slightly. I don't know if it gives any additional benefit beyond the comments.

One possible thought I had is that you could, in addition to the hex testing, add llvm-dwarfdump checks too. The hex check ensures the content is correct, whilst the llvm-dwarfdump check shows a) llvm-dwarfdump is correct, and b) provides a more readable way of showing yaml2obj is correct. If there's a failure in the llvm-dwarfdump check, you'd need to carefully inspect the hex to see if the hex check was wrong (i.e. yaml2obj has a bug) or there's a bug in llvm-dwarfdump. This would probably need some sort of commenting to explain the process. Maybe a readme.txt in the test directory would be enough, I don't know.

labath added inline comments.Jul 22 2020, 5:37 AM
llvm/lib/ObjectYAML/DWARFEmitter.cpp
445–447

I've been wondering about that too. I personally don't find the ability to add more/less "values" than the appropriate DW_LLE kind expects useful, because the yaml structure will have no relation to how the result ends up being parsed. And the extra info needed to make this operation well-defined makes the yaml more verbose for the normal cases.

What I would find useful is if, at certain strategic levels, it would be possible to insert an arbitrary hex blob instead of fully formed yaml. One such place could be the dwarf expression -- instead of all the DW_OP thingies we could just add a hex blob. That would also give an escape hatch for the obj2yaml path so that it can serialize any DWARF expression with unknown/unsupported opcodes as hex.

Another such level could be the level of an individual location list -- if there's anything funny about the location list, one could put the entire list as hex, even though some parts of it could be valid and representable normally. I think that's fine because this is not likely to be needed often. Or maybe this could apply to an entire debug_loclists contribution. Or both...

At the last level, one could specify the entire section contents as hex, which is sort of what we have now.

jhenderson added inline comments.Jul 22 2020, 5:44 AM
llvm/lib/ObjectYAML/DWARFEmitter.cpp
445–447

That makes sense to me. There are some cases in the ELF code, where we fall back to using hex section contents for known section types, when the content is unparseable for whatever reason.

Higuoxing marked an inline comment as done.Jul 22 2020, 6:09 AM
Higuoxing added inline comments.
llvm/lib/ObjectYAML/DWARFEmitter.cpp
445–447

One of the things yaml2obj is useful for is for creating broken ELFs. I see you're adding further checks here to make sure the Values array is the right size for the type. But what if you want to have an incorrect number of values for a given operation?

Sounds good! What about having an additional field called Types. If the Types field is specified, yaml2obj emits values according to the Types array. If it doesn't exist, yaml2obj emits values according to the spec.

- Operator: DW_LLE_startx_endx
  Types:    [ ULEB128, Hex64 ]
  Values:   [ 0x1234, 0x4321 ]

The benefit is that we don't have to specify the Value field every time. The number of operands can also be adjusted via the Types array.

- Operator: DW_LLE_startx_endx
  Types:    [ ULEB128, Hex64, Hex16 ]
  Values:   [ 0x1234, 0x4321, 0x12 ]
labath added inline comments.Jul 22 2020, 7:36 AM
llvm/lib/ObjectYAML/DWARFEmitter.cpp
445–447

I sort of like that because it leaves the common case succinct, but it still seems like more flexibility than needed. I am not really opposed to that -- I'm sure it can be useful at times -- but I'm not sure the use will be frequent enough to justify the added complexity (you'd still need to handle all the situations when the types array is not specified _and_ also the case where the type array is there). Since yaml tests are typically done by taking obj2yaml output and modifying/reducing it, and obj2yaml will never produce output like that, I'm not sure how many people will even discover it.

Nit: I don't think HexNN is a good name because it bears no relation to how the input is written (one can just as easily write the value array in decimal), and the fact that the output is going to be hex (binary) is kinda obvious. Maybe just IntNN, or SignedNN/UnsignedNN is singedness matters (I don't think it does).

Higuoxing updated this revision to Diff 280024.Jul 22 2020, 9:14 PM
Higuoxing marked 3 inline comments as done.

Address comments.

Higuoxing marked an inline comment as done.Jul 22 2020, 11:18 PM
Higuoxing added inline comments.
llvm/lib/ObjectYAML/DWARFEmitter.cpp
445–447

Nit: I don't think HexNN is a good name because it bears no relation to how the input is written (one can just as easily write the value array in decimal), and the fact that the output is going to be hex (binary) is kinda obvious. Maybe just IntNN, or SignedNN/UnsignedNN is singedness matters (I don't think it does).

Oh, thanks a lot! I've upload a diff revision D84386 that illustrates my idea. Any thoughts?

The DWARF v5 standard explicitly allows for a value of 0 for the offset_entry_count, meaning there are no offset entries. My reading is that this is allowed even if there are location lists, but I don't see any testing for that situation, I think? I believe the same applies for .debug_rnglists, but I haven't looked to see if you implemented that there.

llvm/lib/ObjectYAML/DWARFEmitter.cpp
445–447

I like the approach personally. I've commented more on D84386.

448

It looks like this lambda could be using checkListEntryOperands with a bit of a tweak to that function's name.

llvm/test/tools/yaml2obj/ELF/DWARF/debug-loclists.yaml
352

omitted in -> omitted from

427

Missing a comment for the end of list markers in both sequences?

501

I think this case is tested by case e)?

526–527

Looking at this, I think it might be best to test the majority of this in a gtest unit test. It seems like there's a lot of detail that would better fit there. I might be wrong though. What do you think?

Higuoxing updated this revision to Diff 280814.Jul 27 2020, 1:33 AM
Higuoxing marked 7 inline comments as done.

Address comments.

llvm/test/tools/yaml2obj/ELF/DWARF/debug-loclists.yaml
427

It's not the end_of_list marker, it's a counted location description, whose length is 0.

501

Removed.

526–527

I'm not sure. The number of operands is checked in the section emitting period. The gtest unit testing currently only supports testing the YAML parser.

Higuoxing updated this revision to Diff 280821.Jul 27 2020, 1:52 AM

Remove unrelated changes.

Higuoxing updated this revision to Diff 280886.Jul 27 2020, 6:17 AM

checkOperandsNumber -> checkOperandNumber

The DWARF v5 standard explicitly allows for a value of 0 for the offset_entry_count, meaning there are no offset entries. My reading is that this is allowed even if there are location lists, but I don't see any testing for that situation, I think? I believe the same applies for .debug_rnglists, but I haven't looked to see if you implemented that there.

I might be missing this, but was this point addressed yet?

llvm/test/tools/yaml2obj/ELF/DWARF/debug-loclists.yaml
758–759

Rather than a known value that we one day will likely support, causing this test to be updated, perhaps we should pick a value that isn't supported by the DWARF spec?

Higuoxing updated this revision to Diff 281461.Jul 28 2020, 9:22 PM
Higuoxing marked an inline comment as done.
  • Add test cases u, v, w.
Higuoxing added inline comments.Jul 28 2020, 9:26 PM
llvm/test/tools/yaml2obj/ELF/DWARF/debug-loclists.yaml
758–759

I'm not sure if I'm doing right here. If we pick a value that isn't supported by the DWARF sepc, the error message

yaml2obj: error: DWARF expression: DW_OP_blah is not supported
                                   ^~~~~~~~~~

will be left untested.

jhenderson added inline comments.Jul 30 2020, 12:38 AM
llvm/test/tools/yaml2obj/ELF/DWARF/debug-loclists.yaml
758–759

Ah, I see. Perhaps we should have both then. (One with arbitrary unrecognised hex, and the other using a recognised value which isn't supported).

809

This is fine, but I think an expectation might be that if OffsetEntryCount: 0 is specified, (and not Offsets: []), it would have the same effect.

I think you want two more cases:

  1. OffsetEntryCount: 0 and no Offsets key at all (will result in no offsets).
  2. OffsetEntryCount: 0 and a non-empty Offsets key (will result in 0 in offset_entry_count, but with offsets).

Does that make sense to you?

Higuoxing marked 2 inline comments as done.Aug 2 2020, 9:03 PM
Higuoxing added inline comments.
llvm/test/tools/yaml2obj/ELF/DWARF/debug-loclists.yaml
758–759

I've added one test that use a recognized value DW_OP_entry_value since it might not be implemented recently, so that we don't have to update this test very frequently.

809

Done. But we should land D85006 first.

Higuoxing updated this revision to Diff 282498.Aug 2 2020, 9:05 PM
Higuoxing marked 2 inline comments as done.

Address review comments.

Updated tests: (t), (w), (x), (y).

jhenderson added inline comments.Aug 3 2020, 1:54 AM
llvm/test/tools/yaml2obj/ELF/DWARF/debug-loclists.yaml
846–847

Same comments apply here as in D85006.

Higuoxing updated this revision to Diff 282562.Aug 3 2020, 3:14 AM
Higuoxing marked an inline comment as done.

Address review comments.

Thanks for reviewing!

Seems that there are too many changes since previous revision. Most of the changes are caused by rebase. Only test cases (w), (x), (y) are updated.

jhenderson accepted this revision.Aug 3 2020, 3:46 AM

Looks good to me!

This revision is now accepted and ready to land.Aug 3 2020, 3:46 AM
Higuoxing updated this revision to Diff 282619.Aug 3 2020, 7:31 AM

Re-run pre-merge tests.

This revision was landed with ongoing or failed builds.Aug 3 2020, 8:20 AM
This revision was automatically updated to reflect the committed changes.