Inserts constants into format strings.
printf("Hello, %s", "world") - > printf("Hello world")
Differential D47159
[InstCombine] Format String optimizations xbolva00 on May 21 2018, 1:50 PM. Authored by
Details
Diff Detail Event TimelineComment Actions Some tests fail with this change: I will check it.. Comment Actions I'm not sure we can rewrite calls to varargs functions safely in general given the current state of the C ABI rules in LLVM. Sometimes clang does weird things to conform with the ABI rules, because the LLVM type system isn't the same as the C system. For most functions, it's pretty easy to tell it happened: if the IR signature of the function doesn't match the expected signature, something weird happened, so we can just bail out. But varargs functions don't specify a complete signature, so we can't tell if the clang ABI code was forced to do something weird, like split an argument into multiple values, or insert a padding value. For example, for the target mips64-unknown-linux-gnu, a call like printf("asdf%Lf", 1.0L); gets lowered to the following: %call = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i32 0, i32 0), i64 undef, fp128 0xL00000000000000003FFF000000000000) #2 Comment Actions printf("asdf%Lf", 1.0L); is also interesting to see Clang vs GCC. GCC does some weird magic :D I tested some architectures, only mips seems to have such weird IR. Maybe it is possible to turn this transformation for some, e.g. x86? Comment Actions Converting to printf("single string literal") will likely be a small performance pessimization (because the string literal must be scanned for embedded % characters). I think the most efficient form is instead likely to be a format string comprising *only* format specifiers; for example, given printf("hello %s, my favorite number is %d because it is %s", "world", n, "prime"); ... when optimizing for speed, the best code we can produce is probably printf("%s%d%s", "hello world, my favorite number is ", n, " because it is prime"); ... whereas when optimizing for size, it's probably printf("hello world, my favorite number is %d because it is prime", n); (but in the latter case we probably also want to check that each of the input string literals has only one use or we risk increasing the data size).
Comment Actions Too bad we cannot transform printf(str) to fputs(str, stdout)/fwrite. It would be quite interesting I think. But since we know how stdout is represented under GNU/glibc, we can do it under condition "isLinux && isGNU" ? CI->getModule->getGlobalVariable("stdout")? But this can be done for only for constants since "%%" could be in the format string. What do you think? |
The transform miscompiles this case. The original program would print out "str: str: %s" and the transformed program will make an invalid call to printf.