This is an archive of the discontinued LLVM Phabricator instance.

Add the ability to dump hex bytes with optional offsets and ASCII bytes.
ClosedPublic

Authored by clayborg on Nov 8 2016, 9:43 AM.

Details

Summary

There are sure to be many places where a buffer of bytes need to be dumped in LLVM. This change allows users to quickly dump a buffer of bytes with optional offset prefix and ASCII display:

Example output for raw hex bytes:
55 48 89 e5 48 81 ec 70 04 00 00 48 8d 05 10 02
00 00 4c 8d 05 fd 01 00 00 4c 8b 0d d0 02 00 00

Example output for raw hex bytes with ASCII:
55 48 89 e5 48 81 ec 70 04 00 00 48 8d 05 10 02 UH.?H.?p...H....
00 00 4c 8d 05 fd 01 00 00 4c 8b 0d d0 02 00 00 ..L..?...L..?...

Example output for raw hex bytes with ASCII and offsets/addresses:
0x100000d10: 55 48 89 e5 48 81 ec 70 04 00 00 48 8d 05 10 02 UH.?H.?p...H....
0x100000d20: 00 00 4c 8d 05 fd 01 00 00 4c 8b 0d d0 02 00 00 ..L..?...L..?...

Added unit tests as well.

Diff Detail

Event Timeline

clayborg updated this revision to Diff 77207.Nov 8 2016, 9:43 AM
clayborg retitled this revision from to Add the ability to dump hex bytes with optional offsets and ASCII bytes..
clayborg updated this object.
clayborg set the repository for this revision to rL LLVM.

The output looks better with a mono spaced font of course.

zturner edited edge metadata.Nov 8 2016, 10:25 AM

Just a tip. If you wrap the block in triple backticks it will show up in a fixed-width font in phabricator (see below for an example)

That aside, this looks pretty good, but we have this functionality in a couple of places in LLVM already. It would be nice to centralize. Also, this is just a personal preference, but I kind of prefer the way the existing implementations format this. Maybe you could look at the existing implementations and if you prefer it as well, move the code for them over here?

Here's a sample of what our current output looks like:

bin\llvm-pdbdump raw -stream-data=1 d:\src\llvm\test\debuginfo\pdb\inputs\empty.pdb
...
      0000: 942E3101 E207E554 01000000 0B355641  |..1....T.....5VA|
      0010: 86A0A249 896F9988 FAE52FF0 22000000  |...I.o..../."...|
      0020: 2F4C696E 6B496E66 6F002F6E 616D6573  |/LinkInfo./names|
      0030: 002F7372 632F6865 61646572 626C6F63  |./src/headerbloc|
      0040: 6B000300 00000600 00000100 00001A00  |k...............|
      0050: 00000000 00001100 00000900 00000A00  |................|
      0060: 00000D00 00000000 00000500 00000000  |................|
      0070: 00004191 3201                        |..A.2.|

The code for this is in ScopedPrinter::printBinaryImpl.

Even if we don't combine the two right away, I still prefer this style, so it would be nice if the API matched this style closely enough that it would be possible to have printBinaryBlock delegate to your function in the future with only minor changes.

Some limitations of the existing implementation:

  1. Doesn't allow configurable line length. It's always 16-bytes per line.
  2. Doesn't allow configurable field-width. Always 4-bytes per field.
  3. Doesn't allow prefixing the address with 0x or left-padding with zeros.

Feel free to take on as much or as little of this as you think is useful. But I think we should at least be consistent with the output style.

lib/Support/raw_ostream.cpp
370

This will need to be an uppercase I.

388

Same here.

unittests/Support/NativeFormatTests.cpp
48 ↗(On Diff #77207)

I know it's a poorly named file, but these tests should probably go in raw_ostream_test.cpp. NativeFormatTests.cpp was intended specifically to test the functions in NativeFormatting.cpp.

mehdi_amini edited edge metadata.Nov 8 2016, 11:16 AM

Cool feature!
I only have minor comments inline.

include/llvm/Support/Format.h
208

This wasn't clear to me until I read the code in raw_ostream that this is the "starting offset" for the offset to print in the margin. I thought it was about indention.

223

I'm not against the "type erasure" with void *, but what about also keeping an overload with an ArrayRef of uint8_t? It could avoid to have to unpack at call sites.

lib/Support/raw_ostream.cpp
373

Early exit:

if (Index >= Size)
  break;
390

This loop could iterate from 0 to i and you wouldn't need this test here.
Repeating the same early exit as the previous loop might be easier to read though.

zturner added inline comments.Nov 8 2016, 11:21 AM
include/llvm/Support/Format.h
223

I kind of am. Is there a reason why we need to specify void* P, size_t N when we can just specify ArrayRef<uint8_t>?

clayborg updated this revision to Diff 77246.Nov 8 2016, 1:01 PM
clayborg edited edge metadata.
clayborg removed rL LLVM as the repository for this revision.

Fixed all of the previous inline comments:

  • moved unittest to right file
  • added support for byte groups so we can display 4 bytes at a time with no spaces by default
  • added '|' delimiters so the ASCII looks better when displayed and matches other clients
  • fixed early exits from loops
  • used ArrayRef<uint8_t> instead of void * and size_t.
clayborg marked 5 inline comments as done.Nov 8 2016, 1:02 PM

Updates Done status on inlined comments.

mehdi_amini accepted this revision.Nov 8 2016, 1:05 PM
mehdi_amini edited edge metadata.

LGTM now. Wait a little to see if @zturner has any other comment.

include/llvm/Support/Format.h
223

I see a reason: it avoids casting at call site when you don't have uint8_t (for instance if you have a ArrayRef<int8_t>).

(However is it dangerous because N here is the number of bytes and if you have an ArrayRef<int> Arr it is too easy to pass Arr.size() for N)

This revision is now accepted and ready to land.Nov 8 2016, 1:05 PM

Mostly looks good, just a few nits and I think we're good.

include/llvm/Support/Format.h
208

This should probably be a size_t. Also, maybe it can be an Optional<size_t>, which seems to more closely model what we want to use it for.

230–235

It's a bit confusing to me that there are two almost identical versions of the function with different default values for the parameters. What about having a single version that takes all parameters? If so, I would suggest the signature be something like:

format_hex_bytes(ArrayRef<uint8_t>, bool ShowAsciiBlock = true, uint32_t BytesPerLine=16, uint8_t BytesPerGroup = 4, bool Upper = true, Optional<size_t> FirstByteOffset = 0).

I'm not sure if this is the optimal ordering of parameters, but I tried to arrange this in order from most commonly specified to least commonly specified. Not sure if I got it right though.

If you think it's better to have two versions of the functions, at least make the default values of the arguments be the same.

mehdi_amini added inline comments.Nov 8 2016, 1:32 PM
include/llvm/Support/Format.h
230–235

Which default argument value is different between the two free functions? I think they are identical.

The default is different on the class ctor though, and it may even be possible to remove the default there as the two free functions are making the uses "nice" enough.

clayborg updated this revision to Diff 77261.Nov 8 2016, 2:21 PM
clayborg edited edge metadata.

Took care of requested fixes.

Let me know if things look good Zach.

include/llvm/Support/Format.h
208

Good idea on the optional. It probably won't use a size_t since that can be 32 bits on some systems and might prevent you from showing a 64 bit address/offset in your output.

231

I think two functions is clearer. I did remove the defaulted args in the ctor. Both default args are the same for format_hex_bytes and format_hex_bytes_with_ascii.

zturner accepted this revision.Nov 8 2016, 2:29 PM
zturner edited edge metadata.

My only other suggestion is to terminate the output when the last byte is printed rather than printing spaces. Since isprint(' ') will be true, you won't be able to distinguish trailing spaces in the input from simply running out of bytes. For example, if you tried to print the string " " then your ascii would just look like | |, and it doesn't provide any value. So I would print that like this instead: | |. Even if the last row doesn't line up, I think that's more clear. For example, the code I linked to earlier prints this:

0000: 942E3101 E207E554 01000000 0B355641  |..1....T.....5VA|
0010: 86A0A249 896F9988 FAE52FF0 22000000  |...I.o..../."...|
0020: 2F4C696E 6B496E66 6F002F6E 616D6573  |/LinkInfo./names|
0030: 002F7372 632F6865 61646572 626C6F63  |./src/headerbloc|
0040: 6B000300 00000600 00000100 00001A00  |k...............|
0050: 00000000 00001100 00000900 00000A00  |................|
0060: 00000D00 00000000 00000500 00000000  |................|
0070: 00004191 3201                        |..A.2.|

And that makes it clear from the ASCII that the byte sequence terminates there instead of having some trailing spaces in it.

Feel free to commit after that.

include/llvm/Support/Format.h
208

Ahh that's a good point.

clayborg closed this revision.Nov 8 2016, 4:26 PM

I fixed Zach's request to remove the extra spaces in the ASCII and committed with 286316.