- Introduces the unsafe_buffer_usage function attribute.
- Issues a warning at call sites to functions wearing this attribute.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/include/clang/Basic/AttrDocs.td | ||
---|---|---|
6214 | We should keep this in-sync with: | |
clang/lib/Sema/SemaDeclAttr.cpp | ||
8451 | I think we should remove the commented out code. |
So here's what we roughly think about the future of this attribute.
In this initial patch the idea is that we don't restrict the applicability of the attribute in any way. It can be put on functions that don't have any pointer parameters, or any parameters at all, or any return values, they can be plain functions or methods, whatever. The attribute simply says "This function isn't compatible with the Safe Buffers programming model" without specifying where the buffers are or where the bounds are or how to mitigate the problem.
Then later we're going to expand the attribute to offer automatic fixes. Then the default mode of this attribute should probably mean "There's an automatic fix, and it's *obvious*", eg.:
[[clang::unsafe_buffer_usage]] void foo(int *buf, size_t size); void foo(std::span<int> buf); foo(buf, size); // obvious automatic fix: foo(std::span(buf, __placeholder__));
And we're thinking of attribute misuse warning that informs the user that "The fix isn't obvious enough" (eg., couldn't figure out how to match parameters):
[[clang::unsafe_buffer_usage]] // warning: cannot figure out the automatic fix void foo(int *buf, size_t size); void foo(std::span<int> sp); // note: no parameter named 'buf' foo(buf, size); // warn about unsafe operation but no fixit
And, separately, there will be a "mode" of this attribute that says "There's no automatic fix, please fix all call sites manually" which corresponds to the current patch's implementation (maybe includes a human-readable explanation of how to mitigate the problem) (exact syntax TBD):
[[clang::unsafe_buffer_usage(__nofix, "use the bar() interface instead")]] // no warning, the attribute is fine void foo(int *buf, size_t size); foo(buf, size); // warning: use the bar() interface instead
Why do I think the default mode should provide a fix? Imagine you're a library vendor, and you're updating your APIs to conform to the Safe Buffers programming model. Then you may add an attribute, but you don't necessarily have an easy way to test whether the compiler will emit an automatic fix or not. And if you forget to test that, it silently breaks our entire tool, because any user code can no longer be modernized automatically. Or if the function definitions diverge and the original automatic fix can no longer be applied, that'd be another silent breakage scenario.
Which is why I think __nofix *cannot* be the default behavior. If folks are breaking our tool (that they clearly intended to support, given that they added the attribute in the first place), they should know about that and explicitly opt in. And the good news is, the compiler can fully figure out whether an automatic fix is theoretically possible, by looking only at the prototypes. So it has enough information to notify people if they're breaking our tool.
Then, there's a question of how smart the default behavior should be. It's a matter of discussion how deep do we want to go, but I believe quality of life is important there. If our hypothetical library vendors can't figure out how to satisfy the attribute's prerequisites, they'll give up and go for __nofix which, again, diminishes the usefulness of our tool.
Then later we'll probably add attribute arguments through which the user can provide more hints to the compiler, eg. if it couldn't match parameters (exact syntax TBD):
[[clang::unsafe_buffer_usage(buf=sp)]] void foo(int *buf, size_t size); void foo(std::span<int> sp); foo(buf, size); // assisted automatic fix: foo(std::span(buf, __placeholder__));
Or if the safe function isn't an overload:
[[clang::unsafe_buffer_usage(foo=foo2)]] void foo(int *buf, size_t size); void foo2(std::span<int> buf); foo(buf, size); // assisted automatic fix: foo2(std::span(buf, __placeholder__));
We'll see how deep do we want to dig into this. Obviously, for all these cornercases, we'll have to teach the fixit machine to handle them, otherwise it's pointless to support them. It might be beneficial to consider a "full control" mode in which no automatic name matching of any kind takes place, for all the people who don't want to rely on the compiler to do magic for them.
clang/include/clang/Basic/AttrDocs.td | ||
---|---|---|
6212 | Looks like this part is an accidental duplicate of the one below. | |
clang/test/SemaCXX/warn-unsafe-buffer-usage-function-attr.cpp | ||
2 | Do you still need -Wno-deprecated-declarations? I think the test file doesn't have any deprecated declarations anyway. |
clang/include/clang/Basic/Attr.td | ||
---|---|---|
3971 |
Well, we restrict it to *functions*, but I'm not even sure this is necessary. It might make sense in some cases to have this attribute on, say, entire *classes*. But we can extend that later. Another good target for the attribute is Objective-C methods (which aren't FunctionDecls). Because Objective-C++ is a thing and folks who use it might be interested. But, again, we don't have any concrete data on that. |
I need a way better commit message... I have no idea what this is doing on first read. NoQ's comment is perhaps a good start.
That said, I don't like this form of incremental approach. Doing it this way means the additions you want to do could end up being breaking changes for folks. We should limit where/how this can be called, and with what parameters down to ONLY what functionality we're implementing with it, and add 1 of those functionalities, fully tested, at a time.
But allowing it to fully appertain to anything means that we are going to have a harder time changing what that menas in the future.
clang/include/clang/Basic/AttrDocs.td | ||
---|---|---|
6280 | This doc is woefully unspecified here. |
That said, I don't like this form of incremental approach. Doing it this way means the additions you want to do could end up being breaking changes for folks. We should limit where/how this can be called, and with what parameters down to ONLY what functionality we're implementing with it, and add 1 of those functionalities, fully tested, at a time.
Fair enough, we should probably avoid intentionally breaking people. So I guess starting with a
[[clang::unsafe_buffer_usage(__nofix, "use the bar() interface instead")]]
variant (where the hint is mandatory) would make sure we're starting with a very strict model that only gets more relaxed over time without breaking existing uses.
But we may also run into hypothetical situations when, say, more unforeseen data comes in from our users, and the data would indicate that we have to change something in a breaking way. Would such breakage also be completely unacceptable, or can we do that under certain conditions? Should we consider this possibility and prepare for it by, say, telling the users in documentation that the attribute is experimental and may break? (Of course, we'll do our best to eliminate that clause and stabilize the syntax as soon as possible.)
Telling users the attribute is experimental is an acceptable way to get around 'unforseen' issues, but we should still strive to make as few breaking changes as possible. That is, don't use said documentation as an excuse to break users arbitrarily, else you'll have a difficult time getting folks to use this/try this during its 'experimental' time. That is, every time you break it, you're giving existing users an opportunity to just say "eh, lets just remove all of these from our code, not worth the extra effort", and future users a reason to say, "eh, lets just wait until later (read: never) and see if it is worth using when it becomes more popular".
The summary says usage_buffer_usage, which really confused me. For such an invasive feature, I would expect a more detailed summary.
clang/lib/Analysis/UnsafeBufferUsage.cpp | ||
---|---|---|
323–338 | In a nearby patch we've agreed to unhardcode/reuse the .bind tags like this! |
clang/lib/Sema/SemaDeclAttr.cpp | ||
---|---|---|
9335 | Nit: git just complained we should not have whitespace at the end of lines. |
Here's an update on how we plan to implement the attribute after considering feedback from everyone:
The semantics of our function attribute was originally "calls to this function are unsafe".
Now we are thinking about it more like "calls to this function are unsafe AND automatically fixable".
We propose a design where the attribute has parameters and the above is the default. The valid forms and how clang responds to them would be:
- Implementation Stage 1: [[clang::unsafe_buffer_usage]]
- Clang emits -Wunsafe_buffer_usage warning at call-sites, fixit emitted on best-effort basis.
- Implementation Stage 2: [[clang::unsafe_buffer_usage(arguments for fixit generation...)]]
- Example of the parameter use-cases: foo/bar (allows to emit a fixit that calls function bar instead of foo, another example is when the user changes parameters of the safe overload (e. g. remove a “size” parameter).
- Fixits are still on best-effort basis.
- Implementation Stage 3: [[clang::unsafe_buffer_usage(__no_autofix)]]
- Clang emits a different warning (TBD) at the attribute if we think we won’t be able to emit a fixit for call-sites (when we can’t find an overload with exactly 1 different parameter type changed from T* to std::span<T>).
- __no_autofix parameter suppresses that warning.
Note that the TBD warning is disabled by default. In order to see these warnings the users not only have to opt in into using the attribute (by turning on -Wunsafe_buffer_usage), but also opt in into seeing this specific (TBD) warning group.
With this strategy, we think we can avoid the situation where we introduce changes that will break code for people who adopted a previous version of the attribute.
We're curious to see what everyone thinks of this incremental strategy.
Can you add more details to that proposal? This is a 'devil in the details' sorta thing for me. Else, feel free to start implementing stage-1 (I suspect you've proposing what I commented on earlier), and I can comment on it there.
Yeah so the change of plans compared to my initial take mostly boils down to this part:
Why do I think the default mode should provide a fix? Imagine you're a library vendor, and you're updating your APIs to conform to the Safe Buffers programming model. Then you may add an attribute, but you don't necessarily have an easy way to test whether the compiler will emit an automatic fix or not. And if you forget to test that, it silently breaks our entire tool, because any user code can no longer be modernized automatically. Or if the function definitions diverge and the original automatic fix can no longer be applied, that'd be another silent breakage scenario.
Which is why I think __nofix *cannot* be the default behavior. If folks are breaking our tool (that they clearly intended to support, given that they added the attribute in the first place), they should know about that and explicitly opt in. And the good news is, the compiler can fully figure out whether an automatic fix is theoretically possible, by looking only at the prototypes. So it has enough information to notify people if they're breaking our tool.
So we no longer believe this is true. We think that if somebody is *not* a library vendor, then for them the requirement to annotate everything that isn't autofixable may be overly tedious. So now we're thinking that the default, argument-less syntax
[[clang::unsafe_buffer_usage]]
should be always allowed and always mean only "this function is unsafe", regardless of whether the automatic fix is possible or sufficiently obvious from context.
Then eventually we'll implement a separate warning to verify that the automatic fix is possible and recommends adding extra information into the attribute if it's not sufficient as is:
Implementation Stage 3: [[clang::unsafe_buffer_usage(__no_autofix)]]
- Clang emits a different warning (TBD) at the attribute if we think we won’t be able to emit a fixit for call-sites
- __no_autofix parameter suppresses that warning.
Note that the TBD warning is disabled by default. In order to see these warnings the users not only have to opt in into using the attribute (by turning on -Wunsafe_buffer_usage), but also opt in into seeing this specific (TBD) warning group.
But it is going to be a separate warning that requires separate opt-in from the user. So existing code that uses [[clang::unsafe_buffer_usage]] will not be "broken" by the introduction of this new warning (but only by the users opting into the extra thing), which (partially?) addresses the incremental development concerns that you voiced earlier. WDYT, would such landing strategy be acceptable?
Also, yeah, I think we should start properly documenting these things. It's arguably clear enough as of today, we can already write something down.
I'm unconcerned about people being broken by a new warning. I suggest that the overall 'Wunsafe-buffer-usage' (or whatever the top-level warning group is) ALWAYS include every warning (with sub-warning-classes to disable individuals if you'd like). The whole point of new warnings is to fire on existing code as QoI improves, so that is expected.
What is NOT acceptable to me is that once an attribute form is written, that it becomes ill-formed in some way. So allowing the attribute in the empty-form on every function (limited to functions, right?), then it cannot EVER become not-allowed/diagnosed for that. If you allow a parameter to it, that form can never be disallowed, etc.
My suggestion was to only add it to functions that you KNOW you can diagnose in the exact form you know you can diagnose, so you don't get stuck in a place where you cannot make something illegal. I believe this is WORSE in cases where you allow parameters without giving immediate meaning to them, but is worth making sure you DO want to someday allow this in the same form on every function. So, are you ok with this being on C variadics, template variadics, prototypeless functions, etc?
I think we're on the same page here, with our updated proposal. We're going to permanently allow the attribute in its current form as proposed by this patch, on arbitrary functions. Some code will have it written this way and that's perfectly ok for that code to not be concerned about automatic fixability at all. We're never going to make it illegal.
I also don't see any problems with the special function kinds you've listed. It's perfectly reasonable to let the developer annotate the function as "Well, this thing is using some unsafe buffers here and there, and I refuse to specify how or where exactly", no matter how the function is declared and what information about the function is available or how much flexibility it offers. Such annotation is perfectly acceptable and sufficient in many cases. Then anybody who uses this function will know that they have to avoid it if they're opting into the safe buffers programming model (but not if they aren't opting into it), but it's likely that they'll have to figure out themselves what to replace it with (which is in many cases inevitable anyway).
With templates, it probably makes sense to test the behavior of the attribute on explicit instantiations. Eg., https://godbolt.org/z/djbTqbYET but with [[clang::unsafe_buffer_usage]] instead of [[deprecated]] - see how it only warns about the second call, we probably want the same behavior. Though it's probably the default behavior anyway.
Yeah, I think this is reasonable, maybe it's not that important to keep them completely separate 🤔 Even if the new warning gets enabled unnecessarily on some code, the user will immediately know how to disable it without disabling the entire machine. I think we'll indeed go with a subgroup.
Aha, great, you reused a chunk of D136811 to document the attribute. I have a few suggestions how to make it make sense to the reader who didn't read the documentation in D136811, because even when D136811 lands there will still be people who learn about the attribute for the first time from https://clang.llvm.org/docs/AttributeReference.html and not from the document in D136811.
clang/include/clang/Basic/AttrDocs.td | ||
---|---|---|
6279–6282 | (the last part is carried over from below because it sounds important enough to say up front) | |
6297–6302 | ||
6312–6315 | ||
6324 | Looks like a formatting typo (those pesky review comments probably got in the way). | |
6332–6336 | Hmm, this part is hard to understand even in the full context of D136811, so that's on me. Let me try to do a better job here. I moved the part about escape hatch to the beginning of the document because it directly describes what the attribute does, which is important to say up front in this kind of documentation. | |
6338–6347 | I think we can drop this section entirely. It's mostly motivational, doesn't add much. | |
6349 | This TODO can be removed, because there are no parameters yet. When we implement the parameters, we'll still need to document them. |
Ok with the new documentation I think this patch is mostly good to go! @erichkeane WDYT, are we missing anything?
At a glance it looks like we're short on tests. We might be happy about the default behavior of everything, but it's important to confirm that we actually understand them, by adding tests that document the intended behavior and protect it from regressing. So I think it's worth it to add the cornercases Erich mentioned:
as well as the cornercase I brought up in response:
and probably interactions with C++ inheritance (if the attribute is on the base class method, is the overriding method also warned about?, etc.).
clang/lib/Analysis/UnsafeBufferUsage.cpp | ||
---|---|---|
392 | Formatting ^.^ | |
393–395 | This FIXME is a bit stale because after D140062 WarningGadgets aren't responsible for emitting fixits. |
At some point we'd want to add this to release notes just like we did with D135851. But we probably want to do that for the entire -Wunsafe-buffer-usage feature, not for individual elements like this attribute. I think I'll add a release note to D136811 so that it landed at the same time as we start considering the feature is "usable" (for that we need to have an initial implementation of all the enforcement and opt-out elements - warnings, pragmas (D140179), the attribute, but not necessarily fixits).
Ok, let's add a virtual method / inheritance test and IMHO we're good to go!
clang/test/SemaCXX/warn-unsafe-buffer-usage-function-attr.cpp | ||
---|---|---|
22–23 | Hmm, maybe we should have a specialized warning/note text for these functions? Passing a pointer into these functions isn't "pointer operation" in and of itself. Maybe we should teach the handler class to say something like warning: function 'deprecatedFunction3' introduces unsafe buffer manipulation or when we have a variable: warning: 'z' is an unsafe pointer used for buffer access note: passed into 'deprecatedFunction4' here, which uses the buffer unsafely ? (in this case 'z' isn't a pointer but you get the point) (could be a follow-up patch) |
As for me, this looks like it's good to go! Thanks Rashmi for addressing all the comments!
Well, we restrict it to *functions*, but I'm not even sure this is necessary. It might make sense in some cases to have this attribute on, say, entire *classes*. But we can extend that later.
Another good target for the attribute is Objective-C methods (which aren't FunctionDecls). Because Objective-C++ is a thing and folks who use it might be interested. But, again, we don't have any concrete data on that.