This is an archive of the discontinued LLVM Phabricator instance.

[Object][COFF] Fix section name parsing error when the name field is not null-padded
ClosedPublic

Authored by pzheng on Jun 8 2022, 6:27 PM.

Details

Summary

Some object files produced by Mirosoft tools contain sections whose name field
is not fully null-padded at the end. Microsoft's dumpbin is able to print the
section name correctly, but this causes parsing errors with LLVM tools.

So far, this issue only seems to happen when the section name is longer than 8
bytes. In this case, the section name field contains a slash (/) followed by the
offset into the string table, but the name field is not fully null-padded at the
end.

Diff Detail

Event Timeline

pzheng created this revision.Jun 8 2022, 6:27 PM
Herald added a project: Restricted Project. · View Herald Transcript
pzheng requested review of this revision.Jun 8 2022, 6:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 8 2022, 6:27 PM
mstorsjo accepted this revision.Jun 8 2022, 11:57 PM

The change itself LGTM

llvm/test/tools/llvm-objdump/COFF/long-section-name.test
20

As the replacement is hardcoded to contain /4 here, I think it feels slightly inconsistent to take the pattern /4 as a parameter. This test wouldn't really work with any other parameter than /4 (even if the symbol would constain a different offset than 4, then the replacement pattern would need to have the same offset too.

27

I was wondering about whether the test is slightly brittle - if the pattern isn't found at all, I presume that this process would fail somehow (if pos returns as -1). I'd just want to avoid that the test silently keeps passing if the offset turns out different for some reason.

Maybe it'd be safest with a check like if pos == -1: exit(1) to make it clearer?

This revision is now accepted and ready to land.Jun 8 2022, 11:57 PM
pzheng updated this revision to Diff 435610.Jun 9 2022, 10:43 AM

Update the test to address @mstorsjo's comments.

pzheng marked 2 inline comments as done.Jun 9 2022, 10:46 AM
pzheng added inline comments.
llvm/test/tools/llvm-objdump/COFF/long-section-name.test
20

I agree, it's better to just hard code /4 into the test.

27

Makes sense to me too. Thanks for the suggestion!

mstorsjo accepted this revision.Jun 9 2022, 12:19 PM

LGTM, thanks!

llvm/test/tools/llvm-objdump/COFF/long-section-name.test
19

Nitpick (and/or I'm not so well versed in python), couldn't this be just b'/4' like the line below it?

pzheng updated this revision to Diff 435654.Jun 9 2022, 12:52 PM
pzheng marked 2 inline comments as done.

Small update the test

pzheng marked an inline comment as done.Jun 9 2022, 12:54 PM
pzheng added inline comments.
llvm/test/tools/llvm-objdump/COFF/long-section-name.test
19

Good catch! Just updated the test to only use b''.

This revision was landed with ongoing or failed builds.Jun 9 2022, 12:58 PM
This revision was automatically updated to reflect the committed changes.
pzheng marked an inline comment as done.
rnk added inline comments.Jun 9 2022, 2:36 PM
llvm/lib/Object/COFFObjectFile.cpp
1171

I think it's a bug that getAsInteger doesn't work on non-null terminated StringRefs. It's not an invariant that StringRefs are null terminated. We explicitly form a non-null terminated StringRef on line 1161 above.

pzheng added inline comments.Jun 9 2022, 3:23 PM
llvm/lib/Object/COFFObjectFile.cpp
1171

hmm..., not sure if getAsInteger is supposed to handle a situation like this, the description of the function isn't very clear, but I tend to agree with you that this could be bug unless there are already code in LLVM which assumes getAsInteger should fail given such input.

pzheng added inline comments.Jun 9 2022, 6:46 PM
llvm/lib/Object/COFFObjectFile.cpp
1171

Looking at the implementation of getAsInteger, it looks like it's actually supposed to fail with such input where only the first part of it is a valid integer. getAsInteger requires the whole string to be consumed or else it's considered as a failure.

efriedma added inline comments.
llvm/lib/Object/COFFObjectFile.cpp
1171

In the testcase, Name.substr(1) contains the value "4\0abcde", i.e. an embedded null. Since '\0' isn't a digit, getAsInteger() is correctly rejecting it. consumeInteger() just stops parsing at '\0'.

I suspect this code shouldn't be passing down a StringRef with embedded nulls, though. Maybe the if (Sec->Name[COFF::NameSize - 1] == 0) check is wrong. (The spec says "null-padded", but maybe in practice Microsoft tools just treat it as "null-terminated".)

pzheng added inline comments.Jun 10 2022, 6:04 PM
llvm/lib/Object/COFFObjectFile.cpp
1171

Interesting point. The object which originally exposed the issue has an embedded null right after "/4" and some other garbage characters after that embedded null. Therefore, I used that in my test case. However, dumpbin is able to parse the section name correctly even if the string does not have an embedded null, e.g., "/4abcdef". So, modifying the if (Sec->Name[COFF::NameSize - 1] == 0) check to not pass down a StringRef with embeded nulls alone is probably not enough to prevent getAsInteger from failing with inputs like "/4abcdef", and we may still end up having to use consumeInteger to ignore the junks after "/4".

mstorsjo added inline comments.Jun 11 2022, 1:15 AM
llvm/lib/Object/COFFObjectFile.cpp
1171

FWIW, even if dumpbin accepts a string like /4abcdef I'm not sure if we strictly need to accept that - if there's no observed cases of such object files in the wild. But tolerating /4\0abcdef seems reasonable to me.

rnk added inline comments.Jun 13 2022, 1:11 PM
llvm/lib/Object/COFFObjectFile.cpp
1171

OK, I think I misunderstood the issue. I think the change is fine as is.