This is an archive of the discontinued LLVM Phabricator instance.

[llvm-readobj] Fix printing format
ClosedPublic

Authored by paulsemel on Jun 18 2018, 3:36 AM.

Details

Summary

We were printing every character, even those that weren't printable. It doesn't really make sense for this option.

The string content was sticked to its address, added two spaces in between (this matches GNU readelf behavior)

Diff Detail

Repository
rL LLVM

Event Timeline

paulsemel created this revision.Jun 18 2018, 3:36 AM

Could you add a test case?

paulsemel updated this revision to Diff 151930.Jun 19 2018, 9:30 AM

Added tests

dblaikie added inline comments.Jun 20 2018, 1:38 PM
test/tools/llvm-readobj/print-section.test
2–20

Is any of this testing/demonstrating the unprintable character support? It doesn't immediately look like it to me? (I'm guessing/would be expecting some "...." sort of regions)

paulsemel added inline comments.Jun 20 2018, 4:36 PM
test/tools/llvm-readobj/print-section.test
2–20

No, but I'm not sure that it really makes sense to have such a test, as it would result to :

  • Form a not realistic binary
  • Dump a binary section and have a LOT (almost every) of unprintable characters What do you think ?
dblaikie added inline comments.Jun 21 2018, 12:41 PM
test/tools/llvm-readobj/print-section.test
2–20

Yep, dumping a non-textual section. But it could be a small section - or a manually created section (built from assembly with some text and non-text).

This test would pass with or without your change - so that's not sufficient to validate that this change has the intended effect.

paulsemel updated this revision to Diff 152372.Jun 21 2018, 1:49 PM
paulsemel marked an inline comment as done.

Added a test with non printable characters.

*looks at this a bit more closely*

Ah, OK, so this is a string dumping mode - why is it dumping non-printable characters at all? (I assumed this was reimplementing/matching behavior of some existing tool (in the same way that llvm-objdump is meant to be like the binutils 'objdump' tool - I just assumed there was a 'readobj' tool out there somewhere, but googling around I don't immediately see anything like that)) more like the way the 'strings' tool would work?

Or is this trying to match some existing/similar behavior elsewhere?

*looks at this a bit more closely*

Ah, OK, so this is a string dumping mode - why is it dumping non-printable characters at all? (I assumed this was reimplementing/matching behavior of some existing tool (in the same way that llvm-objdump is meant to be like the binutils 'objdump' tool - I just assumed there was a 'readobj' tool out there somewhere, but googling around I don't immediately see anything like that)) more like the way the 'strings' tool would work?

Or is this trying to match some existing/similar behavior elsewhere?

Actually, you were close, this is readelf from GNU Binutils (that's what my GSoC is about :) ).
This option exists in this tool, with the same name.
They are printing non-printable characters a little bit different, but I find this way better :)

dblaikie added inline comments.Jun 26 2018, 2:25 PM
tools/llvm-readobj/ELFDumper.cpp
4281–4284

Maybe factor this out into a utility:

char ToPrintable(char X) { 
  return isprint(X) ? X : '.';
}

and/or pulling out the whole loop into a utility, since it's duplicated in these two places:

void PrintAsPrintable(ScopedPrinter& W, StringRef S) {
  for (char C : S)
    W.startLine() << (isprint(C) ? C : '.');
}
...
PrintAsPrintable(W, StringRef(CurrentWord, WordSize));
paulsemel marked an inline comment as done.

Refactor printing function.

dblaikie accepted this revision.Jun 29 2018, 11:48 AM

Probably worth generalizing over the other formats too (again, refactoring as much as possible into common helper functions - guessing most of the contents of "printSectionAsString" could be generalized over all the different formats without duplicating it in each one. But that can be done in follow-up patches.

This revision is now accepted and ready to land.Jun 29 2018, 11:48 AM
This revision was automatically updated to reflect the committed changes.
paulsemel updated this revision to Diff 153656.Jul 1 2018, 2:59 PM

I got a build fail here : http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/10648
As for now, I still don't know why I'm getting this error, but I guessed it was because I was passing the raw_stream..
This should fix this, I hope

paulsemel reopened this revision.Jul 1 2018, 2:59 PM
This revision is now accepted and ready to land.Jul 1 2018, 2:59 PM

I got a build fail here : http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/10648
As for now, I still don't know why I'm getting this error, but I guessed it was because I was passing the raw_stream..
This should fix this, I hope

I'd guess it's pretty unlikely that it's the passing of a raw_ostream by reference - that's a pretty common thing. To my mind, the code that looks more likely to be crashing there is that maybe the StringRef is out-of-bounds and the range-based for loop is running off the end of the underlying buffer.

Have you tried running the test under valgrind? asan? msan? things like that. You might be able to reproduce/identify the crash on your machine so you can validate a fix rather than throwing it at the buildbots again.

paulsemel added a comment.EditedJul 2 2018, 3:03 PM

I got a build fail here : http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/10648
As for now, I still don't know why I'm getting this error, but I guessed it was because I was passing the raw_stream..
This should fix this, I hope

I'd guess it's pretty unlikely that it's the passing of a raw_ostream by reference - that's a pretty common thing. To my mind, the code that looks more likely to be crashing there is that maybe the StringRef is out-of-bounds and the range-based for loop is running off the end of the underlying buffer.

Have you tried running the test under valgrind? asan? msan? things like that. You might be able to reproduce/identify the crash on your machine so you can validate a fix rather than throwing it at the buildbots again.

Well, I ran it for the first patch. This one, I'm just refactoring, nothing has changed for the bounds right ?
If the first patch passed the buildbots, that means there was no problem about bounds (and as I said, I ran asan on it).
I will do it again tomorrow, maybe the but was not trigger before and is now, but I honestly doubt it :)

Which was the first patch that passed the buildbots? Could you
point/mention the commit revision? & then the commit revision that failed?
I hadn't realized a version had been committed before.

paulsemel added a comment.EditedJul 2 2018, 3:09 PM

Sure: https://reviews.llvm.org/D47989
As you can see, the bounds haven't changed. That's why I really don't understand the error (and honestly, the error could have been more verbose...)
Here is the version I pushed: https://reviews.llvm.org/rL336058

Just to end this out of bounds possibility : in the failing test, the text section is at the middle of the binary. So, even if I was going out-of-bounds, there shouldn't be corruption.
I'm completely lost with what is going wrong with this one..

Rebased the failing version.

amccarth added inline comments.
tools/llvm-readobj/ELFDumper.cpp
3225

The behavior of isprint is undefined if the value is not representable as an unsigned char. Since char on Windows is signed (by default), this means that some bytes could be negative, which is not representable and an unsigned char.

I think the safest thing to do is to cast C to unsigned char.

paulsemel added inline comments.Jul 11 2018, 1:36 AM
tools/llvm-readobj/ELFDumper.cpp
3225

Hmm I'm perplex.. Doesn't isprint take an int as parameter ? I mean, if it takes an int in parameter, I don't understand why it could lead to undefined behavior if we give it a negative number...
Can you elaborate on this one, I'm really curious to learn where is the problem :)

amccarth added inline comments.Jul 11 2018, 8:41 AM
tools/llvm-readobj/ELFDumper.cpp
3225

Yes, isprint takes an int so that it can distinguish between a character and EOF, but EOF is the only valid negative value to pass to it.

"The behavior is undefined if the value of ch is not representable as unsigned char and is not equal to EOF." https://en.cppreference.com/w/cpp/string/byte/isprint

Let's say you have the character value 193 (0xC1). On Windows, char is signed and thus a char variable representing that character value is effectively -63. When passed to isprint, it's promoted to an int, you'll end up with -63 instead of 193 which puts you into the realm of undefined behavior.

If you first cast to an unsigned char, the 193 is preserved when cast to an int, and everything should be hunky dory.

Note that EOF is a negative value, usually (always?) -1, so it would alias with an actual character value if we were allowed to use negative character values.

Now, in practice, this may not manifest as a detectable bug because (1) the isprint implementation might do something reasonable and/or (2) if <locale> has been included, then you get a convenience function templated on the actual character type that does the cast for you (§ 22.3.3.1). Then again, this call is isprint rather than std::isprint, so whether it resolves to the convenience function depends on whether <locale> dumps all its symbols into the global namespace as well as into std.

paulsemel closed this revision.Jul 18 2018, 3:32 AM

Closed due to to duplicate

Has this been recommitted yet?

I added llvm::isPrint in r338034 which should help fix this issue.