This is an archive of the discontinued LLVM Phabricator instance.

[FunctionAttrs] Detect printf-like functions and propagate readonly+nocapture to variadic operands
AbandonedPublic

Authored by jmolloy on Nov 9 2015, 6:15 AM.

Details

Summary

If a function is just a wrapper around vsprintf or friends, in some cases it neither captures nor writes its arguments.

Because such a function is variadic, we are very conservative about any operand it is called with elsewhere in the compiler, so here if we can we should propagate readonly + nocapture to any operand to the function at all callsites.

A variadic pointer argument to a printf-like function is only sometimes nocapture+readonly. For example in this case foo is captured into bar:

sprintf(buf, "%p", &foo); sscanf(buf, "%p", &bar);

For this reason we have to be careful which pointer arguments we mark as nocapture, and we have to parse the format string to be able to understand how the pointer is being used. In this implementation, the only "safe" use of a pointer is "%s", as it is not possible to encode the pointer address textually via %s (unlike with almost any numeric operator: %d/%u/%f etc).

Even though this code only detects %s, it is still useful as %s is so widely used. Consider a "plugin structure":

struct a { const char identifier[32], void (*f)(); } A;
myprintf("ident: %s\n", &A.identifier);

Even if A were determined to be constant, no calls to A.f() could be speculated because a pointer into A is potentially captured. With this analysis we can determine that A is not captured so GlobalOpt can SROA the struct and make indirect calls to A.f() direct.

In the future this can be extended to other v* functions in the standard library that have defined behavior on their operands, such as vscanf and friends (these do not capture, but do write).

Diff Detail

Repository
rL LLVM

Event Timeline

jmolloy updated this revision to Diff 39685.Nov 9 2015, 6:15 AM
jmolloy retitled this revision from to [FunctionAttrs] Detect printf-like functions and propagate readnone+nocapture to variadic operands.
jmolloy updated this object.
jmolloy added reviewers: chandlerc, mcrosier.
jmolloy set the repository for this revision to rL LLVM.
jmolloy added a subscriber: llvm-commits.

Ping!

Adding more reviewers - is anyone available to give this a look?

Cheers,

James

Hi James,

Do you know what other parts of the compiler are preventing us from figuring out the readonly/nocapture attributes?

If we could use the existing infrastructure for this we might get a more general solution (I'm not saying that's possible).

Cheers,
Silviu

lib/Transforms/IPO/FunctionAttrs.cpp
881

Wouldn't it be enough to have readonly/nocapture? Why do we need to test for library functions?

Hi James,

Ok, that makes sense.

So this is a work-around for the lack of attributes for varargs in llvm IR (which affects mostly library functions?).

There might be alternative solutions (either allowing attributes for .. or being able to mark the entire function as nocapture),
but I can’t tell if that would be worth it or not… This is something that might be worth thinking about.

I’ll have to leave this review to someone else, I wouldn’t feel confident enough to give a lgtm in this area.

Cheers,
Silviu

From: James Molloy [mailto:james@jamesmolloy.co.uk]
Sent: 12 November 2015 16:41
To: reviews+D14497+public+665d022658dbb442@reviews.llvm.org; Silviu Baranga; James Molloy; chandlerc@gmail.com; mcrosier@codeaurora.org; hfinkel@anl.gov; david.majnemer@gmail.com
Cc: llvm-commits@lists.llvm.org
Subject: Re: [PATCH] D14497: [FunctionAttrs] Detect printf-like functions and propagate readnone+nocapture to variadic operands

Hi Silviu,

The problem is that most of the compiler expects function argument attributes to be the same as call operand attributes. Most of FunctionAttrs is devoted to inferring and adding attributes to function arguments (see most of the code below my change, where it finds nocapture). For varargs, there is no such function argument as the number of call operands are greater than the number of function arguments. There's just no representation for "..." to have attributes in the IR.

As to why we check for LibFuncs, that's because it's possible to take a variable argument and do whatever you like with it. You could capture it, store it to a global, write over it... In fact, vscanf() does this exactly and writes into its variable arguments.

James

majnemer edited edge metadata.Nov 13 2015, 11:15 AM

If I am not mistaken, these functions may capture the variadic arguments in Glibc environments via register_printf_specifier and register_printf_function. Personally, I'd sleep fine at night if we assumed that registered handler functions cannot capture or store to the arguments.

lib/Transforms/IPO/FunctionAttrs.cpp
910–913

Your description says ReadNone but the code says ReadOnly.

sprintf definitely isn't nocapture; simple testcase:

#include <stdio.h>
#include <assert.h>
int main() {
    int a = 3;
    // Serialize the address of `a`
    char buf[32];
    sprintf(buf, "%p", &a);
    // Deserialize the address of `a`
    int* aptr;
    sscanf(buf, "%p", &aptr);
    // Check value of `a`
    assert(*aptr == 3);
}

sprintf can also write to its arguments: "%n" might not be particularly popular, but it's still legal.

jmolloy updated this revision to Diff 40285.Nov 16 2015, 7:27 AM
jmolloy retitled this revision from [FunctionAttrs] Detect printf-like functions and propagate readnone+nocapture to variadic operands to [FunctionAttrs] Detect printf-like functions and propagate readonly+nocapture to variadic operands.
jmolloy updated this object.
jmolloy edited edge metadata.

Hi David, Eli,

Thanks for looking this over. Eli, thanks for jumping in - you're completely right, and I hadn't considered this.

Going back to the drawing board a little, I think the single most important thing to be able to handle is "%s" specifiers. These are obviously incredibly common and I'd argue are might contribute the most to (non-legitmately) escaping pointers.

The revamped patch adds parsing support for printf-style format strings. The parsing isn't perfect, but it doesn't need to be - a best-effort approach is probably good enough for most cases.

The new patch will only ever add readonly+nocapture to pointer arguments known to be used in %s specifiers. It can be extended if we later find we want to infer attributes on other arguments (perhaps %p/%d/%u can still be readonly, just not nocapture? and %n can be nocapture but not readonly?)

I appreciate it's now significantly more code.

Cheers,

James

jmolloy abandoned this revision.Nov 17 2015, 8:24 AM

On second thoughts, this is getting a bit too complex and special-cased. I'll try and find an alternative solution,