This is an archive of the discontinued LLVM Phabricator instance.

[clang][ThreadSafety] Analyze known function pointer values
AbandonedPublic

Authored by tbaeder on Jun 6 2023, 1:41 AM.

Details

Summary

When we're calling a function pointer, try to get the currently known value of the variable and analyze calling that instead.

I think ideally we would get all possible values at this point and analyze them all(?), but I'm not sure how to implement this.
I've not added a test case because of the above issue/uncertainty, so I'd like some advice on how to implement that or if it's not what should happen.

(I've added everyone with >100 lines in this file as a reviewer, except Caitlin Sadowski, who I can't find in Phabricator.)

Diff Detail

Event Timeline

tbaeder created this revision.Jun 6 2023, 1:41 AM
Herald added a project: Restricted Project. · View Herald Transcript
tbaeder requested review of this revision.Jun 6 2023, 1:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 6 2023, 1:41 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

I think ideally we would get all possible values at this point and analyze them all(?), but I'm not sure how to implement this.

Err, if we don't know the callee, there's not a whole lot we can do. I don't think we'd want to make guesses at what potentially could be stored in the function pointer at that point.

Overall, though, I think this direction makes sense, but I'd like to hear from @aaronpuchert and/or @delesley.

clang/lib/Analysis/ThreadSafety.cpp
2066–2068
tbaeder updated this revision to Diff 530177.Jun 10 2023, 12:35 AM
tbaeder marked an inline comment as done.
aaronpuchert added a comment.EditedJun 11 2023, 3:00 PM

That's a tough one. The change seems to be correct, and makes more patterns visible. Of course I wonder how it comes that we see calls of a function pointer with a uniquely determined value, as that would seem like obfuscation. Maybe you can show an example how this might look like in practice?

Another issue (that of course you can't know about) is that I'm not sure about the LocalVariableMap. It's currently only used to find a try-acquire function return value from a branch, not in the core analysis and can apparently be quite expensive for long functions with exception handling edges. I'd like to get rid of it, unless we decide to use it for resolving aliases more widely. But I'm wary of alias analysis, because we don't want to reimplement the static analyzer, and we also want to be more predictable, maybe even provide correctness guarantees. Alias analysis ends in undecidable territory quite early on, and using heuristics would sacrifice predictability.

The relatively strict "dumb" analysis that we're doing today seems to work well enough for the purpose of tracking locks, though we can look through local references. (Their pointee doesn't change. Though it doesn't always work.) My hope would be that we can leave it at that and don't build a fully-fledged alias analysis.

I'd like if we could address the issue here by moving to type attributes. These would of course be "sugar", i.e. dropped on canonicalization, as we don't want them to affect overloading etc. But that would still allow more than declaration attributes, like in this case, warning if we drop the attribute on assignment. Of course that's big project though, and it's not clear if it'll work out. The paper by @delesley and @aaron.ballman briefly says that non-sugared attributes would probably not work, and there is a talk where @delesley says something along the lines of "we'd really like to integrate this into the type system, but it would be quite difficult".

That's a tough one. The change seems to be correct, and makes more patterns visible. Of course I wonder how it comes that we see calls of a function pointer with a uniquely determined value, as that would seem like obfuscation. Maybe you can show an example how this might look like in practice?

Another issue (that of course you can't know about) is that I'm not sure about the LocalVariableMap. It's currently only used to find a try-acquire function return value from a branch, not in the core analysis and can apparently be quite expensive for long functions with exception handling edges. I'd like to get rid of it, unless we decide to use it for resolving aliases more widely. But I'm wary of alias analysis, because we don't want to reimplement the static analyzer, and we also want to be more predictable, maybe even provide correctness guarantees. Alias analysis ends in undecidable territory quite early on, and using heuristics would sacrifice predictability.

That's good to know, thank you for bringing this up! I agree that we likely don't want to do alias analysis. The way CallExpr implements getCalleeDecl() is to check whether the callee expression (after ignoring implicit casts and dereference operators) references a declaration of some kind (DeclRefExpr, MemberExpr, BlockExpr) and if so, report back the declaration found. So it does not do any sort of tricky dataflow analysis and only handles relatively simple cases like: void func() {} void (*fp)() = func; fp();

The relatively strict "dumb" analysis that we're doing today seems to work well enough for the purpose of tracking locks, though we can look through local references. (Their pointee doesn't change. Though it doesn't always work.) My hope would be that we can leave it at that and don't build a fully-fledged alias analysis.

I'd like if we could address the issue here by moving to type attributes. These would of course be "sugar", i.e. dropped on canonicalization, as we don't want them to affect overloading etc. But that would still allow more than declaration attributes, like in this case, warning if we drop the attribute on assignment. Of course that's big project though, and it's not clear if it'll work out. The paper by @delesley and @aaron.ballman briefly says that non-sugared attributes would probably not work, and there is a talk where @delesley says something along the lines of "we'd really like to integrate this into the type system, but it would be quite difficult".

Yeah, I seem to recall this adding enough overhead to the type system for it to be problematic. I think we were lamenting that Clang doesn't have a pluggable type system and we'd love to see one added, but that's a big, experimental undertaking.

Maybe you can show an example how this might look like in practice?

AFAIU, the real-world case is more a C "class" struct with a couple of function pointers (a vtable), which might be different depending on circumstances.

So, the problem with this (type of) analysis is that we don't have a perfect view of the (global) program state, right? The CFG is per-function, and any other function (etc.) might change a function pointer. And we don't even know its initial value. Correct? The CFG-based anaylsis is just not enough to reliably diagnose this sort of problem.

So, the problem with this (type of) analysis is that we don't have a perfect view of the (global) program state, right? The CFG is per-function, and any other function (etc.) might change a function pointer. And we don't even know its initial value. Correct? The CFG-based anaylsis is just not enough to reliably diagnose this sort of problem.

Exactly, the analysis is strictly intraprocedural. So we'll only see any value if initialization/assignment and call are in the same function. And if the value is uniquely determined, the question is why does the function do an indirect call at all? I could imagine this in something like a unit test, but these are not so interesting for static analysis.

So basically the code would need to look like this:

void f() __attribute__((requires_capability(mu)));

void g() {
  void (*pf)() = f;
  pf();
}

But why would someone write this instead of a direct call to f?

@aaronpuchert Do you think warning on assignment of function pointers with mismatched attributes is would be a viable way forward? This is what https://github.com/elmarco/clang/commit/bac94282a5c8e3b0410ee8c0522fbdb872ade00c tries to implement IIUC.

Do you think warning on assignment of function pointers with mismatched attributes would be a viable way forward?

Yes, that sounds like the right approach. (Slightly relaxed perhaps, for example adding requires or excludes on the cast should be fine.)

What makes this tricky is that we currently have declaration attributes, but we might need type attributes to do this properly.

tbaeder abandoned this revision.Oct 2 2023, 11:53 PM