This is an archive of the discontinued LLVM Phabricator instance.

[libcxx] Don't use an undefined '+' in unsigned/octal/hexal print formats
ClosedPublic

Authored by mstorsjo on Jun 1 2021, 3:24 AM.

Details

Summary

If building code like this:

unsigned long val = 1000;
snprintf(buf, sizeof(buf), "%+lu", val);

with clang, clang warns

warning: flag '+' results in undefined behavior with 'u' conversion specifier [-Wformat]

Therefore, don't construct such undefined format strings.

This fixes number formatting in mingw-w64 if built with
__USE_MINGW_ANSI_STDIO defined (there, the '+' flag causes a
leading plus to be printed when formatting unsigned numbers too,
while the '+' flag doesn't cause any extra leading plus in other
stdio implementations).

Diff Detail

Event Timeline

mstorsjo requested review of this revision.Jun 1 2021, 3:24 AM
mstorsjo created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2021, 3:24 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
Quuxplusone added a subscriber: Quuxplusone.

Looks reasonable to me. FWIW, a more invasive rewrite would put the letter-computing code first:

    char letter = ((__flags & ios_base::basefield) == ios_base::oct) ? 'o' :
        ((__flags & ios_base::basefield) == ios_base::hex) ? 'x' :
        __signd ? 'd' : 'u';
    if (letter == 'd' && (__flags & ios_base::showpos))
        *__fmtp++ = '+';
    if (__flags & ios_base::showbase)
        *__fmtp++ = '#';
    while (*__len)
        *__fmtp++ = *__len++;
    *__fmtp++ = letter;
}

(Notice the names don't have to be _Uglified in this file because it's not a header.)

libcxx/src/locale.cpp
4600

Please parenthesize (__flags & ios_base::showpos) for clarity. (Otherwise I could imagine a compiler thinking you'd typoed & for &&.)

SMGT, but I'd like some unit tests to test the new correct behaviour. I think @Quuxplusone's proposed change is more readable, but I don't object against your version.

Looks reasonable to me. FWIW, a more invasive rewrite would put the letter-computing code first:

Yup, that'd work too, and it seems like others prefer that version too, so I guess I can rewrite it that way.

SMGT, but I'd like some unit tests to test the new correct behaviour.

Hmm, so a test under libcxx/test/libcxx somewhere, with a local class subclassing std::__1::__num_put_base so that the protected method __format_int can be called, and then testing the different flag combinations?

Mordante accepted this revision.Jun 3 2021, 11:12 PM

SMGT, but I'd like some unit tests to test the new correct behaviour.

Hmm, so a test under libcxx/test/libcxx somewhere, with a local class subclassing std::__1::__num_put_base so that the protected method __format_int can be called, and then testing the different flag combinations?

Hmm I see testing this isn't that trivial. In that case I'd rather see a buildbot as suggested on Discord. But that's out of the scope of this patch.
LGTM, modulo @Quuxplusone's parenthesize request.
It looks like the build failures aren't caused by this patch, but keep an eye out after landing this patch.

This revision is now accepted and ready to land.Jun 3 2021, 11:12 PM
This revision was landed with ongoing or failed builds.Jun 4 2021, 2:09 AM
This revision was automatically updated to reflect the committed changes.