This is an archive of the discontinued LLVM Phabricator instance.

[ADT] function_ref captures function pointers by value
Needs ReviewPublic

Authored by sammccall on Jul 26 2021, 2:56 AM.

Details

Reviewers
chandlerc
rsmith
Summary

Currently function_ref always stores a pointer to its callee, and
expects the callee to stay alive.
This has two surprising consequences.

First, function_ref(&someFreeFunction) is often incorrect: it stores a pointer
to a *temporary* function pointer, and the outer pointer will often dangle.
function_ref(someFreeFunction) does the right thing, stores a plain pointer.
This is surprising because referring to a free function as x or &x often
has the same effect (or produces a diagnostic).

Second, function_ref(someFreeFunction) is miscompiled by GCC 5: it
behaves like function_ref(&someFreeFunction).
https://godbolt.org/z/G8Tj4zWWW

Regarding compatibility with upcoming std::function_ref, the
function_ref(&someFreeFunction) case is mentioned in
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0792r5.html
but I can't really understand how the resolution relates to it.

Diff Detail

Event Timeline

sammccall created this revision.Jul 26 2021, 2:56 AM
sammccall requested review of this revision.Jul 26 2021, 2:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 26 2021, 2:56 AM

I suppose this "breaks" using function_ref to capture by reference a function pointer that later gets reassigned. This doesn't seem very likely to be an issue?

What advantage would this narrow use case have over using a raw function pointer (a function pointer member, in the case of the "S" example, for instance)? implicit parameter/return type conversions? Usually I find those to be more hazardous than intentional.

An alternative would be to delete this overload and insist callers pass a lambda that itself calls the desired function. This makes the user of function_ref more verbose, but it avoids having two dynamic dispatches (I think).

What advantage would this narrow use case have over using a raw function pointer (a function pointer member, in the case of the "S" example, for instance)?

Typically you accept function_ref in an interface, and don't want to constrain callers to only passing things that can be expressed as function pointers.

In the case where we found the S bug, this wasn't the case (it was just a function pointer type with better syntax). But AFAIK that's just a coincidence.

An alternative would be to delete this overload and insist callers pass a lambda that itself calls the desired function. This makes the user of function_ref more verbose, but it avoids having two dynamic dispatches (I think).

Yeah, the double indirection is a fair point, using function pointers to represent functions we know statically is wasteful.

But the ergonomics are pretty bad for functions with complicated (matching) signatures. And it will involve uglifying lots of places where there's no *correctness* problem even on buggy GCC. (If the function_ref is passed to a function and invoked synchronously then the pointer lives long enough).

chandlerc added inline comments.Jul 26 2021, 6:17 PM
llvm/include/llvm/ADT/STLExtras.h
168–170

The more I think about this the more I feel like we shouldn't try to make this promise... it seems too narrow to be useful. I think maybe if we want to do something here, causing things to break immediately rather than kinda-sorta working is better.

For the code where this came up, I think just moving away from function_ref is a much better approach than relying on the special behavior in the case of a pointer.

But happy to defer to dblaikie here ultimately.

Seems like there's 3 alternatives at this point:

  • support function pointers by value (this patch)
  • disallow function pointers/references entirely (breaks existing code)
  • function pointers are UB if the pointer dies, work around the GCC bug somehow (std::function_ref takes this path)
llvm/include/llvm/ADT/STLExtras.h
168–170

it seems too narrow to be useful

I think it's useful - it fixes broken code people expect to write e.g. this looks like an existing bug. And I'm not sure that making it broader would actually be more useful! Nobody expects to be able to *store* e.g. a temporary lambda in a function_ref, but I think that expectation exists for function pointers.

However FWIW on the standard track it's not going to be supported for consistency reasons: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0792r5.html#lifetime-of-pointers-to-function (missed this on my first scan).

For the code where this came up, I think just moving away from function_ref is a much better approach

Agreed, I've committed such a fix to make sure we have something in time for the 13 branch.

What advantage would this narrow use case have over using a raw function pointer (a function pointer member, in the case of the "S" example, for instance)?

Typically you accept function_ref in an interface, and don't want to constrain callers to only passing things that can be expressed as function pointers.

In the case where we found the S bug, this wasn't the case (it was just a function pointer type with better syntax). But AFAIK that's just a coincidence.

I'm not sure I follow how "that's just a coincidence" - it looks to me like the only place this could come up is a place where lambdas and other things that aren't function pointers would not be usable. So this feature could only be used in cases where the constraint is "only use function pointers, if you use anything else it'll break subtly/invoke UB". That doesn't seem like a good API design to carve out this special case while still leaving most of the uses of function_ref in the same context, UB.

I guess it comes down to: Are there situations where this patch fixes code that would've been invalid, but where the same code could use a lambda and be correct?

llvm/include/llvm/ADT/STLExtras.h
168–170

Yeah, that's basically where I am too.

(a temporary lambda, specifically)