This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Print unprintable characters as unsigned
ClosedPublic

Authored by JDevlieghere on Jun 23 2023, 9:38 AM.

Details

Summary

When specifying the C-string format for dumping memory, we treat unprintable characters as signed. Whether a character is signed or not is implementation defined, but all printable characters are signed. Therefore it's fair to assume that unprintable characters are unsigned.

Before this patch, the newly added unit test would print:

"\xffffffcf\xfffffffa\xffffffed\xfffffffe\f”

With this patch, it prints what you would expect:

“\xcf\xfa\xed\xfe\f”

rdar://111126134

Diff Detail

Event Timeline

JDevlieghere created this revision.Jun 23 2023, 9:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2023, 9:38 AM
JDevlieghere requested review of this revision.Jun 23 2023, 9:38 AM
bulbazord accepted this revision.Jun 23 2023, 10:24 AM

Makes sense to me. Does it make sense to add a test case where we mix "printable" characters with "non-printable" ones? I see you have a \f at the end there, but that's a special escaped character. Something like "\xef\xbb\xbfHello World!\n".

This revision is now accepted and ready to land.Jun 23 2023, 10:24 AM

Makes sense to me. Does it make sense to add a test case where we mix "printable" characters with "non-printable" ones? I see you have a \f at the end there, but that's a special escaped character. Something like "\xef\xbb\xbfHello World!\n".

Good idea, I'll add that before landing this.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2023, 11:00 AM

It seems like this patch is really avoiding the sign extension of an int8_t to int32_t in the process of being passed to printf. By casting it to unsigned we've avoided the sign extension and getting the correct result, but it seems like you could have used a printf formatter like s.Printf("\\x%2.2hhx", c);. I'm not opposed to solving it this way, but the description of why it's being cast to unsigned is not really clear imo.

It seems like this patch is really avoiding the sign extension of an int8_t to int32_t in the process of being passed to printf. By casting it to unsigned we've avoided the sign extension and getting the correct result, but it seems like you could have used a printf formatter like s.Printf("\\x%2.2hhx", c);. I'm not opposed to solving it this way, but the description of why it's being cast to unsigned is not really clear imo.

Good point. Went with that in c0045a8e8e0c72a0c8be3a9c333885da4d14d472.