This is an archive of the discontinued LLVM Phabricator instance.

[llvm-readobj][XCOFF] Fix the error dumping for the first item of StringTable.
ClosedPublic

Authored by Esme on Jul 6 2021, 7:38 PM.

Details

Summary

In D104613, we left a FIXME about the error dumping for the first item of StringTable.
This is not caused by parsing, but by the fact that I didn't realize that the first 4 bytes contain the size of the StringTable.
The patch fixes the bug.

Diff Detail

Event Timeline

Esme created this revision.Jul 6 2021, 7:38 PM
Esme requested review of this revision.Jul 6 2021, 7:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2021, 7:38 PM
Esme updated this revision to Diff 356879.Jul 7 2021, 12:26 AM
Esme edited the summary of this revision. (Show Details)
shchenz added inline comments.Jul 7 2021, 5:19 AM
llvm/tools/llvm-readobj/ObjDumper.cpp
56

Can we add a comment about why we need to set start at StrContent + StringSizeFieldLength?

Also can we add a line at the beginning of the string table to show the size of the string table?

68

The output is shown as hexadecimal data, but there is no "0x" prefix for the data, we should add the prefix or use decimal data?

llvm/tools/llvm-readobj/ObjDumper.h
113

Maybe a more meaningful name would be StringSizeFieldLength?

Esme added inline comments.Jul 7 2021, 8:15 PM
llvm/tools/llvm-readobj/ObjDumper.cpp
56

Also can we add a line at the beginning of the string table to show the size of the string table?

Same as above, the function is not only used by dumping the string table. But yes, I can print the size at XCOFFDumper::printStringTable.

llvm/tools/llvm-readobj/ObjDumper.h
113

The function also called by ObjDumper::printSectionsAsString, which dumps the specified section(s), which doesn't have the field containing string content's length. So I think StringSizeFieldLength may be confused in that case?

Esme updated this revision to Diff 357130.Jul 7 2021, 9:40 PM

Address comments.

Esme added inline comments.Jul 7 2021, 9:41 PM
llvm/tools/llvm-readobj/ObjDumper.cpp
68

Changing the output format will affect many tests on dumping sections. That is out of the scope of this patch. We can post another patch for it if necessary.

qiucf added a subscriber: qiucf.Jul 7 2021, 11:49 PM
qiucf added inline comments.
llvm/tools/llvm-readobj/ObjDumper.h
113

What about BeginOffset? And why it's typed as uint8_t, size_t seems better as offset/length if there's not other reasons.

jhenderson added inline comments.Jul 8 2021, 1:10 AM
llvm/tools/llvm-readobj/ObjDumper.cpp
58–59

I'd change this comment to:

"Some formats contain additional metadata at the start which should not be interpreted as strings. Skip these bytes, but account for them in the string offsets."

Don't describe what the code is doing when the code is already clear. Instead, describe why. I think it's important to explain why you don't just use drop_front priort to calling this function: it's because you want to keep the offsets to represent the real offsets in the string table.

68

Don't add 0x. The output format MUST match that of GNU readelf -p for ELF output at least. If you really want to change XCOFF only, that's fine, but you'll need to do it for that format only.

llvm/tools/llvm-readobj/ObjDumper.h
113

More concretely: I'd change the signature to void printAsStringList(StringRef StringContent, size_t StringDataOffset = 0);

llvm/tools/llvm-readobj/XCOFFDumper.cpp
462

What will happen if you try to call this when there is no string table? XCOFF32 doesn't require one as I understand it.

463–465

I'm not opposed to the length field being here, but it should be a separate patch.

464
465

I assume that the length includes the length field itself?

Esme updated this revision to Diff 357855.Jul 12 2021, 12:48 AM

Addressed comments.

Esme added inline comments.Jul 12 2021, 12:48 AM
llvm/tools/llvm-readobj/XCOFFDumper.cpp
462

If there is no string table, we will get an empty string table, where the data pointer is nullptr and the size is zero.

465

Yes, your comment is accurate. Thx!

jhenderson added inline comments.Jul 12 2021, 4:11 AM
llvm/lib/Object/XCOFFObjectFile.cpp
193 ↗(On Diff #357855)

Looks like you may need a test case for a single-byte sized table!

Also, I think you need the following two test cases:

  1. Empty string table with just the length field.
  2. String table which is too short for the length field.

The latter assumes there's a yaml2obj implementation that supports custom string table data. If it doesn't exist, consider adding it, but it's not essential if there isn't.

Esme updated this revision to Diff 358176.Jul 12 2021, 11:54 PM

Added tests.

  1. A single-byte sized string entry.
  2. Empty string table with just the length field.
jhenderson added inline comments.Jul 13 2021, 12:48 AM
llvm/test/tools/yaml2obj/XCOFF/string-table.yaml
1 ↗(On Diff #358176)

These two test cases should be part of the llvm-readobj string table printing test, not here: you're testing llvm-readobj (including the XCOFFObjectFile code, via llvm-readobj), not yaml2obj.

Update the comments accordingly too.

30–31 ↗(On Diff #358176)

I don't think you need any symbols here to ensure an empty string table.

Esme updated this revision to Diff 358191.Jul 13 2021, 1:29 AM

Address comments.

llvm/test/tools/yaml2obj/XCOFF/string-table.yaml
30–31 ↗(On Diff #358176)

The absence of any symbols means that there is no string table, but the case tries to display an empty string table, which has no string data but the length field.

jhenderson added inline comments.Jul 15 2021, 1:06 AM
llvm/test/tools/llvm-readobj/XCOFF/string-table.yaml
1 ↗(On Diff #358191)

Apaprently I missed in the other patch that there wasn't an llvm-readobj test case, even though there should have been.

Can you extend this test file to include a case where there is more than one symbol name, please?

7 ↗(On Diff #358191)

I don't think you need this line. Same goes below.

9 ↗(On Diff #358191)

Use -NEXT suffixes on your check patterns. This test in its current form will also pass with the following output:

StringTable {
<some garbage>
   [     4]   n
<more garbage>
}

Also, delete trailing spaces.

Same comments below.

30 ↗(On Diff #358191)

Why does 32-bit XCOFF generate an empty string table if it isn't used? Is that a bug in the emitter, or a spec detail?

Esme updated this revision to Diff 359227.Jul 15 2021, 11:44 PM

Addressed comments.

llvm/test/tools/llvm-readobj/XCOFF/string-table.yaml
30 ↗(On Diff #358191)

Hmm...I think it's neither a spec detail, nor a bug? That's just how we implement in yaml2obj. We write the string table if the symbol table isn't empty.

if (!Obj.Symbols.empty()) {
    if (!writeSymbols())
      return false;
    // Write the string table.
    Strings.write(W.OS);
}

Due to the fact that only long symbol name can be written into the string table for 32-bit XCOFF, I write the test of non-empty symbol table without long symbol names.
As for 64-bit XCOFF, the symbol name is a required field and all symbol names will be written into the string table, so I can't write such test under 64-bit.

jhenderson added inline comments.Jul 16 2021, 12:41 AM
llvm/test/tools/llvm-readobj/XCOFF/string-table.yaml
55 ↗(On Diff #359227)

You don't need the too-short-to-be-in-the-table name, because that's testing yaml2obj's emission of the string table, not llvm-readobj's dumping of it.

30 ↗(On Diff #358191)

Okay, I'd call that a yaml2obj bug (yaml2obj shouldn't be creating things that aren't needed - doing so wastes time and disk space). That being said, it would be a useful feature if you could customise the presence and/or contents of the string table directly in yaml2obj. Something like the following:

## Emit a string table with specific contents. Custom contents overwrite implicitly generated contents.
--- !XCOFF
FileHeader:
  ...
StringTable:
  - Length: 10 # Force string table length field to be set to this value.
  - Strings: ["a", "b", "cd"] # Force contetns to this.
Symbols: # Optional.
  # Names use values as-if string table were implicitly generated (might be invalid).

## Don't emit a string table.
--- !XCOFF
FileHeader:
  ...
# No StringTable or Symbols element.

## Emit a string table with implicit contents, if and only if required (i.e. 32-bit long name or any 64-bit).
--- !XCOFF
FileHeader:
  ...
Symbols:
  ...

## Suppress emission of string table.
--- !XCOFF
FileHeader:
  ...
StringTable:
  Emit: false # Or perhaps Suppress: true
Symbols:
  ... # Names generated with appropriate offsets as if table emission was not suppressed.

This is similar in concept to how ELF yaml2obj allows changing the contents and even presence of the .strtab section (which contains symbol names), and allows for testing of more code paths (e.g. checking that a symbol name offset is valid, printing of a string table with an invalid length field etc).

Esme updated this revision to Diff 360358.Jul 20 2021, 10:21 PM

Updated test.
Thanks James!
I have post the patch D106420 to solve the issue we mentioned and will post a following patch to add the feature of customizing the string table directly in yaml2obj.

jhenderson accepted this revision.Jul 23 2021, 1:51 AM

LGTM, with a couple of minor fixes.

llvm/test/tools/llvm-readobj/XCOFF/string-table.yaml
28 ↗(On Diff #360358)

I'd put this test case first, then the single-byte case (because it's testing an edge case), and the empty case last.

llvm/tools/llvm-readobj/XCOFFDumper.cpp
463–466

Careful of off-by-one errors! Byte 0 is the first byte...

This revision is now accepted and ready to land.Jul 23 2021, 1:51 AM
This revision was landed with ongoing or failed builds.Aug 3 2021, 2:10 AM
This revision was automatically updated to reflect the committed changes.

Without looking too hard, my guess is that the problem is caused by an empty StringRef returning nullptr from bytes_begin() (or some equivalent behaviour caused further up the chain that leads to that situation implicitly). You should probably add an early out in case the string is empty. If there's still an issue after that fix, then there's something returning a non-empty StringRef starting from the nullptr, which is clearly broken, but is likely further up the stack in terms of where the problem occurs.

Esme added a comment.Aug 4 2021, 12:36 AM

I've committed rG737e27f6236f after rG3df1e7e6f05e to add an early out in case the string is empty.