This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyLibcalls] Optimize string concats using s(n)printf
AbandonedPublic

Authored by xbolva00 on May 9 2018, 10:17 AM.

Details

Summary

sprintf(buf, "%s%s", buf, str) -> strcat(buf, str)

snprintf(buf, n, "%s%s", buf, str) -> strncat(buf, str, n)

Diff Detail

Repository
rL LLVM

Event Timeline

xbolva00 created this revision.May 9 2018, 10:17 AM
lebedev.ri added inline comments.
lib/Transforms/Utils/SimplifyLibCalls.cpp
1841
PRINTF(3)

<...>

NOTES
       Some programs imprudently rely on code such as the following

           sprintf(buf, "%s some further text", buf);

       to  append  text to buf.  However, the standards explicitly note that the results are undefined if source and destination buffers overlap when calling sprintf(), snprintf(), vsprintf(), and vsnprintf().  Depending on the version of gcc(1) used, and the com‐
       piler options employed, calls such as the above will not produce the expected results.

       The glibc implementation of the functions snprintf() and vsnprintf() conforms to the C99 standard, that is, behaves as described above, since glibc version 2.1.  Until glibc 2.0.6, they would return -1 when the output was truncated.
xbolva00 added inline comments.May 9 2018, 10:39 AM
lib/Transforms/Utils/SimplifyLibCalls.cpp
1841

char buf[20] = "hello ";
sprintf(buf, "%s%s", buf, "world");
puts(buf);

No build warnings, and maybe they say that undefined results but I always get the correct results.
I hit no problems when trying these codes and transformations.

I am not sure why would this "possibly undefined results from sprintf" case should block this transformation. E.g. we perform many optimizations just because of UB... But sure, suggestions welcome.

Eli?
@efriedma

Yes, we can do anything if we prove the code has undefined behavior, but I'm not sure why we would prefer a call to strncat over "unreachable".

xbolva00 added a comment.EditedMay 9 2018, 11:53 AM

Yes, we can do anything if we prove the code has undefined behavior, but I'm not sure why we would prefer a call to strncat over "unreachable".

"unreachable" is emitted when (in terms of this transformation)?
snprintf(buf, n, "%s%s", buf, str) works normally, I see no "unreachable".

If your results appear to be "correct", you're just getting lucky that your particular combination of compiler and C library don't do something else.

We should probably just add clang -Wformat warning, rather than try to optimize it in SimplifyLibCalls.

If your results appear to be "correct", you're just getting lucky that your particular combination of compiler and C library don't do something else.

We should probably just add clang -Wformat warning, rather than try to optimize it in SimplifyLibCalls.

And ubsan, if it does not handle this already.

xbolva00 accepted this revision.May 9 2018, 2:21 PM

-Wformat does not handle it.

UBsan.. I tried but I got some error on my system
/usr/bin/ld: cannot find /usr/local/lib/clang/7.0.0/lib/linux/libclang_rt.ubsan_standalone-x86_64.a

Anyway ok, I will close this.

This revision is now accepted and ready to land.May 9 2018, 2:21 PM
xbolva00 closed this revision.EditedMay 9 2018, 2:22 PM

Sorry, accidentally clicked to accept.

smeenai added a subscriber: smeenai.May 9 2018, 2:56 PM

If you're not planning to move forward with the diff, it's better to abandon it rather than close it. Close usually implies it was committed.

xbolva00 reopened this revision.May 9 2018, 3:02 PM
This revision is now accepted and ready to land.May 9 2018, 3:02 PM
xbolva00 abandoned this revision.May 9 2018, 3:02 PM

I am just wondering...

snprintf(buf , n, "some data %s", buf) is UB..
but..
snprintf(buf, n , "%s", buf) should be totally fine. Also for %s%s. If fmt starts with "%s", there should not be a problem.

Eli?

xbolva00 reclaimed this revision.May 9 2018, 3:22 PM
This revision is now accepted and ready to land.May 9 2018, 3:22 PM
xbolva00 requested review of this revision.May 9 2018, 3:22 PM

The C standard is very clear: "If copying takes place between objects that overlap, the behavior is undefined." There is no exception for format strings which start with %s.