This is an archive of the discontinued LLVM Phabricator instance.

[Debug-Info] [llvm-dwarfdump] Don't treat DW_FORM_data4/8 to be section offsets in DWARF3.
AbandonedPublic

Authored by Esme on Jul 6 2021, 3:06 AM.

Details

Reviewers
aprantl
probinson
dblaikie
jhenderson
shchenz
JDevlieghere
jdoerfert
Group Reviewers
Restricted Project
Summary

We found a similar error as the description in D58698 when dumping the debug info of the SPEC xalancbmk_r with llvm-dwarfdump under DWARF3, ie. the location list overflows the debug_loc section.

Although in DWARF3 DW_FORM_data4 and DW_FORM_data8 served also as a section offset, but they may actually be size values instead of offsets in some scenarios. The overflow-error occurred when we try to dump the location list for such scenarios.

After narrowing down the cases, we found the error occurs for three attributes, DW_AT_data_member_location, DW_AT_count and DW_AT_byte_size.

DW_FORM_data4 and DW_FORM_data8 are obviously not offsets in DWARF3+, but it’s hard to tell whether they are offsets in DWARF3. In this patch, we are dropping the support of dumping location for DW_AT_data_member_location, DW_AT_count and DW_AT_byte_size if the form is DW_FORM_data4 and DW_FORM_data8.

Diff Detail

Event Timeline

Esme created this revision.Jul 6 2021, 3:06 AM
Esme requested review of this revision.Jul 6 2021, 3:06 AM
Herald added a project: Restricted Project. · View Herald Transcript

In DWARF2, there isn't a specific form-class for pointing to location lists.
DW_AT_byte_size is always a constant, so DW_FORM_data4/8 are not location-list pointers.
DW_AT_count is either a constant or a reference (to a DIE describing how to compute the count), so DW_FORM_data4/8 are not location-list pointers.
DW_AT_data_member_location is always a location description (either embedded in a block, or referring to .debug_loc), so DW_FORM_data4/8 are actually location-list pointers.

In DWARF3, the forms DW_FORM_data4 and DW_FORM_data8 can be class constant or class loclistptr, depending on the attribute, and if a given attribute allows both, loclistptr is assumed.
DW_AT_byte_size is either a block, constant, or reference, so DW_FORM_data4/8 are not location-list pointers.
DW_AT_count is either a block, constant, or reference, so DW_FORM_data4/8 are not location-list pointers.
DW_AT_data_member_location is either a block, constant, or loclistptr, so DW_FORM_data4/8 must be considered pointers to a location list.

Starting in DWARF4, attributes pointing to a location list would use DW_FORM_sec_offset, and forms data4/data8 are always class constant, never location-list pointers.

This means that DW_AT_byte_size can never point to a location list, and it was a bug to have it say it could (similarly for at least DW_AT_bit_size, and we probably should re-review the entire list). I suppose as an extension it could point to a location list, which would allow different size values based on the current execution address, although the only case I'm familiar with would derive the size from a variable, and so not have to have an actual location list.

Similarly, DW_AT_count can never point to a location list, with the same caveat as for DW_AT_byte_size.

DW_AT_data_member_location is another story; FORM_data4/data8 are supposed to be interpreted as location-list pointers in DWARF 2 and 3. I think the post-commit comments in D58698 suggest that we could always have a version, and I doubt we'll get this completely sorted out until DWARFFormValue requires it.

shchenz added a comment.EditedJul 7 2021, 1:54 AM

This means that DW_AT_byte_size can never point to a location list, and it was a bug to have it say it could (similarly for at least DW_AT_bit_size, and we probably should re-review the entire list). I suppose as an extension it could point to a location list, which would allow different size values based on the current execution address, although the only case I'm familiar with would derive the size from a variable, and so not have to have an actual location list.

DW_AT_byte_size can not be a pointer to a location list, but it could be a exprloc/block kind location, so I guess it is ok to return true for DW_AT_byte_size in function mayHaveLocationDescription(). That function seems check for location descriptions for both exprloc/block kind location and also a location in .debug_loc section. See the function dumpLocation() when mayHaveLocationDescription() returns true.

Maybe a more clear way is to add a new function to check loclistptr related attributes and a new function to dump locations related to loclistptr?

DW_AT_data_member_location is another story; FORM_data4/data8 are supposed to be interpreted as location-list pointers in DWARF 2 and 3. I think the post-commit comments in D58698 suggest that we could always have a version, and I doubt we'll get this completely sorted out until DWARFFormValue requires it.

For DW_AT_data_member_location at DWARF3(default DWARF version for XCOFF), seems now we can not tell if the FORM_data4/data8 are for loclistptr or for constant. for the test case formclass3-atmemloc.s, it is a constant, so making it as location offset to the .debug_loc causes llvm-dwarfdump emit error message, I think we need to fix this. But if we always treat it as constant, it will also cause issue if the value is indeed an offset to .debug_loc. Maybe we have to find a way to differentiate FORM_data4/data8 for loclistptr and FORM_data4/data8 for constant.

This means that DW_AT_byte_size can never point to a location list, and it was a bug to have it say it could (similarly for at least DW_AT_bit_size, and we probably should re-review the entire list). I suppose as an extension it could point to a location list, which would allow different size values based on the current execution address, although the only case I'm familiar with would derive the size from a variable, and so not have to have an actual location list.

DW_AT_byte_size can not be a pointer to a location list, but it could be a exprloc/block kind location, so I guess it is ok to return true for DW_AT_byte_size in function mayHaveLocationDescription(). That function seems check for location descriptions for both exprloc/block kind location and also a location in .debug_loc section. See the function dumpLocation() when mayHaveLocationDescription() returns true.

Hmmm in that case the value is an expression that computes the value, not a location description. There may be some looseness in the terminology in the code that could be tidied up.

For DW_AT_data_member_location at DWARF3(default DWARF version for XCOFF), seems now we can not tell if the FORM_data4/data8 are for loclistptr or for constant. for the test case formclass3-atmemloc.s, it is a constant, so making it as location offset to the .debug_loc causes llvm-dwarfdump emit error message, I think we need to fix this. But if we always treat it as constant, it will also cause issue if the value is indeed an offset to .debug_loc. Maybe we have to find a way to differentiate FORM_data4/data8 for loclistptr and FORM_data4/data8 for constant.

The standard specifies that if an attribute allows one of the '*ptr' classes, then data4/data8 must be interpreted that way. The interpretation is fixed for each attribute. DWARF 3, section 7.5.4 "Attribute Encodings" third paragraph. If the test is using data4 to encode a constant, the test is incorrect. (I haven't really looked at the tests in this patch, except to see that they could probably be reduced quite a bit.)

Esme added a comment.EditedJul 8 2021, 2:04 AM

This means that DW_AT_byte_size can never point to a location list, and it was a bug to have it say it could (similarly for at least DW_AT_bit_size, and we probably should re-review the entire list). I suppose as an extension it could point to a location list, which would allow different size values based on the current execution address, although the only case I'm familiar with would derive the size from a variable, and so not have to have an actual location list.

Similarly, DW_AT_count can never point to a location list, with the same caveat as for DW_AT_byte_size.

I am trying to split this patch into 2 patches.
I have posted the first one D105613 to handle the bug that we try to dump the location list for those attributes that don't have the localist class.

The standard specifies that if an attribute allows one of the '*ptr' classes, then data4/data8 must be interpreted that way. The interpretation is fixed for each attribute. DWARF 3, section 7.5.4 "Attribute Encodings" third paragraph. If the test is using data4 to encode a constant, the test is incorrect. (I haven't really looked at the tests in this patch, except to see that they could probably be reduced quite a bit.)

The test was compiled from a c++ source code, and the overflow error does exist for DW_AT_data_member_location in DWARF3.
Do you mean that we should use other forms to encode a constant in DW_AT_data_member_location?
I will have a look into the issue and post another patch for it.

The standard specifies that if an attribute allows one of the '*ptr' classes, then data4/data8 must be interpreted that way. The interpretation is fixed for each attribute. DWARF 3, section 7.5.4 "Attribute Encodings" third paragraph. If the test is using data4 to encode a constant, the test is incorrect. (I haven't really looked at the tests in this patch, except to see that they could probably be reduced quite a bit.)

The test was compiled from a c++ source code, and the overflow error does exist for DW_AT_data_member_location in DWARF3.
Do you mean that we should use other forms to encode a constant in DW_AT_data_member_location?
I will have a look into the issue and post another patch for it.

Yes, that would be a compiler bug. I agree it's annoying because it means we can't just always use data4, for some attributes.
Note that fixed-size FORMs are preferable to using a LEB, because it helps debugger parsing when DIEs have a fixed size.

Esme abandoned this revision.Jul 11 2021, 7:15 PM

@probinson @shchenz Thanks for your review.
This patch has been split into D105613 and D105687.
Looking forward to your more comments on new patches.