Page MenuHomePhabricator

[InstCombine] Adjust snprintf folding of constant strings (PR #56598)
ClosedPublic

Authored by msebor on Jul 25 2022, 8:54 AM.

Details

Summary

The snprintf library call handler doesn't consider the POSIX requirement that the function fail and set errno = EOVERFLOW when the second argument, n, is greater than INT_MAX, and folds the subset of such calls to the function with strings of known length to constants. At the same time the handler only folds these calls if the second argument is greater than the length of the string, and avoids those where it would imply truncation.

This change adjusts the handler to fold all calls with constant strings up to n == INT_MAX but not others. This lets the calls that are required to fail according to POSIX fail on conforming implementations, while letting the folder emit efficient code for the truncating subset of the calls.

Diff Detail

Unit TestsFailed

TimeTest
60,080 msx64 debian > MLIR.Examples/standalone::test.toy
Script: -- : 'RUN: at line 1'; /usr/bin/cmake /var/lib/buildkite-agent/builds/llvm-project/mlir/examples/standalone -G "Ninja" -DCMAKE_CXX_COMPILER=/usr/bin/clang++ -DCMAKE_C_COMPILER=/usr/bin/clang -DLLVM_ENABLE_LIBCXX=OFF -DMLIR_DIR=/var/lib/buildkite-agent/builds/llvm-project/build/lib/cmake/mlir -DLLVM_USE_LINKER=lld
60,060 msx64 debian > libFuzzer.libFuzzer::fuzzer-leak.test
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/var/lib/buildkite-agent/builds/llvm-project/compiler-rt/lib/fuzzer -m64 /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/fuzzer/LeakTest.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/fuzzer/X86_64DefaultLinuxConfig/Output/fuzzer-leak.test.tmp-LeakTest
60,040 msx64 debian > libFuzzer.libFuzzer::minimize_crash.test
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/var/lib/buildkite-agent/builds/llvm-project/compiler-rt/lib/fuzzer -m64 /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/fuzzer/NullDerefTest.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/fuzzer/X86_64DefaultLinuxConfig/Output/minimize_crash.test.tmp-NullDerefTest
60,070 msx64 debian > libFuzzer.libFuzzer::out-of-process-fuzz.test
Script: -- : 'RUN: at line 2'; rm -rf /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/fuzzer/X86_64DefaultLinuxConfig/Output/out-of-process-fuzz.test.tmp && mkdir /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/fuzzer/X86_64DefaultLinuxConfig/Output/out-of-process-fuzz.test.tmp
60,030 msx64 debian > libFuzzer.libFuzzer::value-profile-load.test
Script: -- : 'RUN: at line 2'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/var/lib/buildkite-agent/builds/llvm-project/compiler-rt/lib/fuzzer -m64 /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/fuzzer/LoadTest.cpp -fsanitize-coverage=trace-gep -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/fuzzer/X86_64DefaultLinuxConfig/Output/value-profile-load.test.tmp-LoadTest
View Full Test Results (79 Failed)

Event Timeline

msebor created this revision.Jul 25 2022, 8:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 25 2022, 8:54 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
msebor requested review of this revision.Jul 25 2022, 8:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 25 2022, 8:54 AM
majnemer added inline comments.
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
2846

Does this do the right thing for targets whose INT_MAX is 16-bit?

msebor added inline comments.Jul 25 2022, 1:43 PM
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
2846

This function isn't executed for such targets because it's gated by TargetLibraryInfoImpl::isValidProtoForLibFunc which expects int to have 32 bits. This may be an okay assumption to make in this case since (if) most 16-bit targets are bare metal these days, but it's not an unavoidable one (not to mention it's not likely to be intended). I only last week (in D129915) discovered the relatively recently introduced getIntSize() function (added via D99438). If/when the part of D129915 that lifts the 32-bit assumption is accepted, this function could also be used here and in the rest of the code that makes it. I'm happy to do that in a follow-up.

majnemer added inline comments.Jul 25 2022, 2:15 PM
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
2846

Sounds good!

There are two separate conditions noted in POSIX related to EOVERFLOW (https://pubs.opengroup.org/onlinepubs/9699919799/functions/fprintf.html):

  • The value to be returned is greater than {INT_MAX}
  • The value of n is greater than {INT_MAX}

To be clear, this is intended to address the case of the value of "n", but not the case of the value to be returned? (I guess it would be a bit of a pain to write a testcase.)

msebor updated this revision to Diff 448379.EditedJul 28 2022, 10:46 AM

I think I just equated the 32-bit StringRef limit with INT_MAX. Thanks for pointing that out!

While re-reviewing the function I noticed a couple more instances of the same problem that I had missed. Rather than duplicating the fix in all three places I factored it out into a new helper and added more tests. I also removed the 32-bit int assumption noted by @majnemer (it's not exercised until TargetLibraryInfoImpl::isValidProtoForLibFunc is adjusted as I explained).

As for testing the handling of excessive output, I can't think of a way without enhancing the function to handle more complex format strings, including directives with a width. Then we could use snprintf(0, 0, "x%*s", INT_MAX, "") to exceed the limit without hardcoding long strings. It's probably not worth the complexity just to test this unlikely case but supporting more involved format strings with multiple %s directives would make it possible to transform the idiom of using snprintf to concatenate strings, like snprintf(buf, sizeof buf, "%s/%s", dirname, filename), and supporting precision (and width) is simple. Another improvement to make this generally useful is to handle nonconstant strings allocated in arrays of known size. But that's for another day.

Ping: @efriedma, do you (or anyone else) have any other questions/comments or is everyone happy with the last revision?

This revision is now accepted and ready to land.Aug 10 2022, 2:02 PM
This revision was landed with ongoing or failed builds.Aug 15 2022, 3:00 PM
This revision was automatically updated to reflect the committed changes.