Page MenuHomePhabricator

[sanitizer-common] Reduce ANSI color sequences that have no effect.
Needs RevisionPublic

Authored by aarongreen on Jun 11 2019, 5:16 PM.



sanitizer_common's consumers use SanitizerCommonDecorator to color their output in various cases. In some cases they end up emitting several escape sequences in a row that don't end up having any effect, e.g. PrintShadowBytes will transition to bold+color, print a byte, transition to default, transition back to bold+color, etc. This extra transitions end up being passed to the RawWrite call in If the implementation of RawWrite has a fixed buffer size (e.g. as in, this can cause the buffer to be consumed unnecessary and even end up breaking the line in the middle of an escape sequence.

This change attempts to mitigate that problem by providing SanitizerCommonDecorator::Compact, which removes sequences that would simply be overwritten by subsequent sequences.

Diff Detail

Event Timeline

aarongreen created this revision.Jun 11 2019, 5:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 11 2019, 5:16 PM
Herald added subscribers: llvm-commits, Restricted Project, mgorny, kubamracek. · View Herald Transcript
phosek added inline comments.Jun 16 2019, 10:17 PM
22 ↗(On Diff #204195)

Why not internal_strncmp(s, "\033[0", 3) != 0?

119 ↗(On Diff #204195)

nit: space between for and (

148 ↗(On Diff #204195)

Can you use braces for else for consistency?

192 ↗(On Diff #204195)

nit: separate with empty line from the class declaration

aarongreen marked 4 inline comments as done.

Addresses phosek's comments.

22 ↗(On Diff #204195)

There's a difference between '0' and '\0'.

vitalybuka added inline comments.

pleas clang-format

vitalybuka added inline comments.

just move ansi_ check into ToStr()

vitalybuka added inline comments.Jul 30 2019, 4:50 PM
22 ↗(On Diff #212462)

lets consume chars to simplify indexing

static const char* getCode(const char* s, char* code) {
    if (internal_strncmp(s, "\033[", 2) != 0 || s[2] == '\0' || s[3] != 'm') 
      return nullptr;
   *code = s[2];
    return s + 4;
SanitizerCommonDecorator::ToKind(const char *s) {
  char c;
  if (!(s = getCode(s, &c))
    return kUnknown;


  if (!(s = getCode(s, &c))
    return kBold;

 switch(c) {

89 ↗(On Diff #212462)

please remove SanitizerCommonDecorator::StrLen or replace with internal_strln( ToKind() )

107 ↗(On Diff #212462)


129 ↗(On Diff #212462)

why do you need to check for "*s < 0x7f"

138 ↗(On Diff #212462)

if (z != s) can be omitted

196 ↗(On Diff #212462)


225 ↗(On Diff #212462)
229 ↗(On Diff #212462)

this test logic is quite fancy, it's going to be harder to debug when it detected a bug
I would prefer simple tests where easy to see from the code what was actual input and expected output:

EXPECT_STREQ(GetParam().input, GetParam().expected);
vitalybuka requested changes to this revision.Aug 2 2019, 12:23 PM
This revision now requires changes to proceed.Aug 2 2019, 12:23 PM
aarongreen marked 6 inline comments as done.

Addressed vitalybuka's comments. The end result is a bit smaller and more elegant than before. I did end up needing to move some code for ColorizeReport (defined in sanitizer_common.h) in order to avoid causing problems in scudo's small-binary build.

aarongreen marked an inline comment as done.Aug 14 2019, 10:40 AM
aarongreen added inline comments.
22 ↗(On Diff #212462)

That doesn't quite work: the colors are 2 chars, i.e. 31.

I've made getCode use internal_simple_strtoll to return an s64 instead, and advance a const char **s_ptr. That way, each call consumes a '\033[...m' sequence and returns the number represented by '...'

I can't promise it's faster, but hopefully it's clearer?

129 ↗(On Diff #212462)

0x7f is DEL, and above that isn't ascii. Printable chars should be 0x21 (!) to 0x7e (~).

196 ↗(On Diff #212462)

seplling is hrad.

vitalybuka added inline comments.Aug 14 2019, 12:44 PM

INLINE is not necessary

can this be?

return SANITIZER_FUCHSIA || report_file.SupportsColors();

should these be kDefault

DecorationKind prev = kUnknown;
DecorationKind next = kUnknown;

code uses "code = kUnknown;" to only break the loop
if we wrap this into function:

const char* getCode(const char* &in, DecorationKind &next) {
while (true) {
    if (internal_strncmp(in, "\033[", 2) != 0)
      return in;
    const char* endptr = in + 2;
    s64 code = internal_simple_strtoll(endptr, &endptr, 10);
    if (endptr <= in + 2 || *endptr++ != 'm')
       return in;

    switch (code) {
        case kDefault:
        case kBold:
        case kBlack:
        case kRed:
        case kGreen:
        case kYellow:
        case kBlue:
        case kMagenta:
        case kCyan:
        case kWhite:
          if (next == kBold)
           return in;
    next = static_cast<DecorationKind>(code);

we don't need kUnknown at all


why next must be kBold?


increment "endptr" right after != 'm' check, so we know where 1 is from


why? // Not null-terminated!
ToStr returns regular c-strings

LGTM as-is with few questions and suggestions

vitalybuka requested changes to this revision.Wed, Oct 16, 2:25 PM
This revision now requires changes to proceed.Wed, Oct 16, 2:25 PM