This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Format String optimizations
AbandonedPublic

Authored by xbolva00 on May 21 2018, 1:50 PM.

Details

Reviewers
efriedma
bkramer
Summary

Inserts constants into format strings.

printf("Hello, %s", "world") - > printf("Hello world")

Diff Detail

Event Timeline

xbolva00 created this revision.May 21 2018, 1:50 PM

TODO:

  • Fix things requested in reviews
  • Add support for fprintf, sprintf, snprintf

Some tests fail with this change:
CodeGen/X86/no-plt-libcalls.ll
Transforms/InstCombine/printf-1.ll
Transforms/InstCombine/printf-2.ll

I will check it..

xbolva00 updated this revision to Diff 147865.May 21 2018, 2:02 PM

Updated other tests

xbolva00 updated this revision to Diff 147875.May 21 2018, 2:37 PM

Updated constant propagation.

I'm not sure we can rewrite calls to varargs functions safely in general given the current state of the C ABI rules in LLVM.

Sometimes clang does weird things to conform with the ABI rules, because the LLVM type system isn't the same as the C system. For most functions, it's pretty easy to tell it happened: if the IR signature of the function doesn't match the expected signature, something weird happened, so we can just bail out. But varargs functions don't specify a complete signature, so we can't tell if the clang ABI code was forced to do something weird, like split an argument into multiple values, or insert a padding value. For example, for the target mips64-unknown-linux-gnu, a call like printf("asdf%Lf", 1.0L); gets lowered to the following:

%call = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i32 0, i32 0), i64 undef, fp128 0xL00000000000000003FFF000000000000) #2
xbolva00 updated this revision to Diff 147884.May 21 2018, 2:58 PM

Fixed case like printf("%s%d", "", 1) in constant propagation to format string.

xbolva00 added a comment.EditedMay 21 2018, 3:00 PM

I'm not sure we can rewrite calls to varargs functions safely in general given the current state of the C ABI rules in LLVM.

Sometimes clang does weird things to conform with the ABI rules, because the LLVM type system isn't the same as the C system. For most functions, it's pretty easy to tell it happened: if the IR signature of the function doesn't match the expected signature, something weird happened, so we can just bail out. But varargs functions don't specify a complete signature, so we can't tell if the clang ABI code was forced to do something weird, like split an argument into multiple values, or insert a padding value. For example, for the target mips64-unknown-linux-gnu, a call like printf("asdf%Lf", 1.0L); gets lowered to the following:

%call = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i32 0, i32 0), i64 undef, fp128 0xL00000000000000003FFF000000000000) #2

Ahh, so should I stop to work on this then?

xbolva00 added a comment.EditedMay 21 2018, 3:06 PM

printf("asdf%Lf", 1.0L); is also interesting to see Clang vs GCC. GCC does some weird magic :D

https://godbolt.org/g/CSW4Ti

I tested some architectures, only mips seems to have such weird IR.

Maybe it is possible to turn this transformation for some, e.g. x86?

xbolva00 edited the summary of this revision. (Show Details)May 22 2018, 2:29 AM
xbolva00 updated this revision to Diff 147981.May 22 2018, 3:56 AM

Restricted to x86 for now
Fixed format string in arg bug + added tests

xbolva00 abandoned this revision.May 22 2018, 5:26 PM
rsmith added a subscriber: rsmith.May 22 2018, 6:03 PM

Converting to printf("single string literal") will likely be a small performance pessimization (because the string literal must be scanned for embedded % characters). I think the most efficient form is instead likely to be a format string comprising *only* format specifiers; for example, given

printf("hello %s, my favorite number is %d because it is %s", "world", n, "prime");

... when optimizing for speed, the best code we can produce is probably

printf("%s%d%s", "hello world, my favorite number is ", n, " because it is prime");

... whereas when optimizing for size, it's probably

printf("hello world, my favorite number is %d because it is prime", n);

(but in the latter case we probably also want to check that each of the input string literals has only one use or we risk increasing the data size).

test/Transforms/InstCombine/format-str.ll
31

The transform miscompiles this case. The original program would print out "str: str: %s" and the transformed program will make an invalid call to printf.

xbolva00 added a comment.EditedMay 22 2018, 6:07 PM

Too bad we cannot transform printf(str) to fputs(str, stdout)/fwrite. It would be quite interesting I think. But since we know how stdout is represented under GNU/glibc, we can do it under condition "isLinux && isGNU" ?

CI->getModule->getGlobalVariable("stdout")?

But this can be done for only for constants since "%%" could be in the format string.

What do you think?