This is an archive of the discontinued LLVM Phabricator instance.

[-Wunsafe-buffer-usage] Add T* -> span<T> Fix-Its for function parameters
ClosedPublic

Authored by ziqingluo-90 on Jan 31 2023, 11:35 PM.

Details

Summary

Generate fix-its for function parameters. Currently, the analyzer fixes one parameter at a time.
Fix-its for a function parameter includes:

  1. Fix the parameter declaration of the definition, result in a new overload of the function. We call the function with the original signature the old overload.
  2. For any other existing declaration of the old overload, mark it with the [[unsafe_buffer_usage]] attribute and generate a new overload declaration next to it.
  3. Creates a new definition for the old overload, which is simply defined by a call to the new overload.

Example,

void foo(int *p);

void foo(int *p) {
   int x = p[5];
}

will be like the following after applying fix-its

[[unsafe_buffer_usage]] void foo(int * p);
void foo(std::span<int> p);

void foo(std::span<int> p) {
...
}
[[unsafe_buffer_usage]] void foo(int * p) {
  return foo(std::span<int>{p, <# size #>});
}

Diff Detail

Event Timeline

jkorous created this revision.Jan 31 2023, 11:35 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 31 2023, 11:35 PM
jkorous requested review of this revision.Jan 31 2023, 11:35 PM
jkorous planned changes to this revision.
jkorous added inline comments.Jan 31 2023, 11:50 PM
clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-parm-span.cpp
10

Oh, yes, this is fun. I hope I will eventually convince FileCheck this C++ attributes are not substitution blocks 😅

https://llvm.org/docs/CommandGuide/FileCheck.html#filecheck-string-substitution-blocks

jkorous updated this revision to Diff 494147.EditedFeb 1 2023, 7:36 PM
  • rebased
  • moved compatibility decl/def below original

still see crash in tests

jkorous retitled this revision from [-Wunsafe-buffer-usage][WIP] Fix-Its for T* -> std::span<T> function parameters to [-Wunsafe-buffer-usage][WIP] Add T* -> span<T> Fix-Its for function parameters.Feb 1 2023, 11:52 PM
jkorous updated this revision to Diff 494177.Feb 1 2023, 11:58 PM
  • fixed all crashes in our tests
  • added cases of functions where we should not emit Fix-Its for parameters
jkorous updated this revision to Diff 496368.EditedFeb 10 2023, 1:18 AM

Started working on conflict detection for added compatibility overload.

Note: One thing that I have little confidence the current patch handles correctly would be qualified function names. Definitely need tests.

jkorous added inline comments.Feb 16 2023, 3:23 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
918

We should not use the attribute directly - instead we should have a macro that gates it with __has_cpp_attribute.
https://clang.llvm.org/docs/LanguageExtensions.html#has-cpp-attribute

jkorous marked an inline comment as not done.Feb 17 2023, 4:07 PM
jkorous added inline comments.
clang/lib/Analysis/UnsafeBufferUsage.cpp
918

Also, we need to think about how do we vend the macro.

jkorous marked 2 inline comments as not done.Mar 1 2023, 3:39 PM
jkorous added inline comments.
clang/lib/Analysis/UnsafeBufferUsage.cpp
899

Sorry, the example below is unrelated. What I meant is that we shouldn't for example use \n on Windows.
EDIT: We also realized during an offline discussion that we should have our tests reflect that (somehow).

As we discussed offline - we should add tests that our overload conflict detection would notice overloads that are declared below the function that's being analyzed.
Since our analysis callback is invoked after a body function has been parsed (but before the whole TU is parsed) it is not immediately clear if all function declarations in the TU will already be represented in the AST at that point.

ziqingluo-90 commandeered this revision.Mar 6 2023, 12:45 PM
ziqingluo-90 edited reviewers, added: jkorous; removed: ziqingluo-90.

Taking care of this patch on behalf of @jkorous

ziqingluo-90 removed a subscriber: ChuanqiXu.

Still work in progress (but comments are very welcomed) ---there are several problems need to be solved:

  • To conservatively detect any overload of the function whose parameter is being fixed. Thus far, the analysis cannot see any overload declared after the analyzing function.
  • Fix-its conflict. Insertions of #include<span> and [[clang::unsafe_buffer_usage]] can conflict if both are inserted at the beginning of a file.
  • There are many unsupported cases.
  • Do we really want to fix parameters one at a time?
  • As the fix-its become more complicated, tests now are harder to read.

I just discovered my original prototype had yet another bug - parameters that use "array syntax" would get spanified but their name would be missing.

E. g.
void foo(int param[]) {
would get transformed to (notice the missing param name:
[clang::unsafe_buffer_usage] void foo(std::span<int>) {

I am wondering what is the source range of the parameter type in this case - the original code likely expects parameter type to be left of parameter name which might or might not be true in case.

It is possible that @ziqingluo-90 has fixed the bug in the meantime but I don't see a test - can we add one just to be sure?

It is likely that Transformer might solve this problem out of the box - maybe we should use it for this patch after all.

clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-parm-unsupported.cpp
63 ↗(On Diff #502893)

Since you mention main in the tests...
We can't change its signature. Not emitting Fix-Its (and not emitting the note to change param type) might be sufficient for now.
Long-term we might want a special Fix-It for argv - maybe construct std::span argv{argv_raw, argc}; local variable but that should be done in another patch.
Can you please add a test that we don't emit Fix-It for argv?

I found one more bug.

The prototype mishandles const qualifier of the parameter type which means it transforms const int* to const span<const int> which is incorrect. That would correspond to const int* const.

Can we add a test for this case?

ziqingluo-90 marked an inline comment as done.

Fix issues in transforming parameter declarations of the form const T * var and T * var[].
Add test for overload declarations appearing after the analyzing function with unsafe parameters.
Add more tests for unsupported cases so that we don't forget.

I just realized one more problem we have to address with this patch - default arguments.
ParmVarDecl::getDefaultArg should be able to tell us if in any of the declarations of the function there is a default argument for the parameter we want to transform. Then we need to figure out if we can fix it/how do we fix it and we shouldn't emit the fixit if we can't fix the default argument. This problem is pretty similar to local variable initialization and we should try to reuse the solution.

clang/lib/Analysis/UnsafeBufferUsage.cpp
875

This TODO will hopefully no longer be true after we land.
https://reviews.llvm.org/D146342

883

IMO we should wrap this in some macro to make it user-friendlier.

Something like:

#ifndef UNSAFE_BUFFER_USAGE
#if __has_cpp_attribute(clang::unsafe_buffer_usage)
#define UNSAFE_BUFFER_USAGE [[clang::unsafe_buffer_usage]]
#else
#define UNSAFE_BUFFER_USAGE 
#endif
#endif

so the code (from our fix-its or written by the users) can look like this:

UNSAFE_BUFFER_USAGE void unsafe_foo(int* ptr, unsigned size);

...and we need to figure where exactly should the macro live (or if there should be multiple definitions, etc.).

I would love to hear @NoQ's thoughts on this topic.

NoQ added inline comments.Apr 18 2023, 1:22 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
883

That's, uh, interesting. We want users to use a macro of this kind and we need the macro to be defined, just like we want the users to use span and we want <span> to be included. But unlike #include <span>, we *really* don't want users to define the macro manually every time they need it. Instead we want it defined manually in some centralized place, and have the fixit machine pick it up from there.

So I think the fixit machine should never suggest *definining* the macro. That's the opposite of what we want.

On the other hand, there's a moderately famous precedent of compiler picking up an existing macro, namely the attribute [[fallthrough]]:

So I think we can try to build something similar.

NoQ added a comment.Apr 18 2023, 1:41 PM

I just realized one more problem we have to address with this patch - default arguments.
ParmVarDecl::getDefaultArg should be able to tell us if in any of the declarations of the function there is a default argument for the parameter we want to transform. Then we need to figure out if we can fix it/how do we fix it and we shouldn't emit the fixit if we can't fix the default argument. This problem is pretty similar to local variable initialization and we should try to reuse the solution.

Oh right.

The problem with default arguments is that their AST lives outside of function body. It's the job of the *caller* to invoke this code. So that's a separate isolated piece of code that our matchers need to traverse together with the function's body.

I recommend giving up for the purposes of this patch (just make sure we don't try to fix such code) and see if we can address this later.

Which reminds me, we also need to handle CXXCtorInitializers correctly (or at least give up when we see them) - they're yet another piece of code that lives outside of function body, and they may access function parameters.

To conservatively give up on fixing function parameters if they have default arguments.

The following improvement will be added by follow-up patches:

  • Improve the insertion of [[unsafe_buffer_usage]]
  • Fix function parameters with default arguments
  • Group fix-its of multiple parameters
NoQ added inline comments.Apr 20 2023, 5:31 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
873–874

I suspect that it's always going to be "the whole TU", given that Sema is currently at the end of TU parsing (because that's where we put our analysis).

I recommend adding some tests with namespaces and other situations when functions are nested into something. Esp. when the out-of-line definition may be lexically available outside of the scope, eg.

namespace N {
  void foo();
}

void N::foo() {
  // implementation
}

where the compatibility overload has to go into the right namespace and the implementation of the compatibility overload has to be correctly prefixed with N::.

883

I think we have to go with raw [[clang::unsafe_buffer_usage]] for now. We can implement macro detection later, but we shouldn't insert preprocessor directives ourselves, because that's absolutely not how we want the final code to look.

clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-parm-span-overload.cpp
26 ↗(On Diff #515500)
clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-parm-span.cpp
11

Is there a semicolon at the end?

Also, do we want to preserve the parameter name p here, given that it's available on the respective redeclaration?

clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-parm-unsupported.cpp
20 ↗(On Diff #515500)

Why, what's wrong with it?

ziqingluo-90 added inline comments.Apr 21 2023, 1:30 PM
clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-parm-span.cpp
11

No. There is no semicolon. The idea is that the fix-it tries to append a new function decl A; to a function decl B; but there is no obvious way to get the end location of B;. We only have the end location of B. So the fix-it inserts ;A to the end of B.

11

I did think about whether preserving the parameter name is necessary and didn't find a strong reason to do so.
A minor reason could be that if contracts are introduced in C++ one day, parameters in declarations need to have names so that contracts can refer to them.

I will add names to fix-its. Also I will be happy to hear any other benefits of doing that.

clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-parm-unsupported.cpp
20 ↗(On Diff #515500)

For const int * const x (or int * const x), we transform it to const std::span<const int> x (or const std::span<int> x). The left-most const are missing in the fix-its.

ziqingluo-90 retitled this revision from [-Wunsafe-buffer-usage][WIP] Add T* -> span<T> Fix-Its for function parameters to [-Wunsafe-buffer-usage] Add T* -> span<T> Fix-Its for function parameters.Apr 27 2023, 3:59 PM

Fix issues in overload lookup. Thanks to @NoQ 's comment.
Fix issues in handling qualified function or parameter names.
Add parameter names back for inserted overload declarations if the names exist in the original function declaration.

ziqingluo-90 marked 2 inline comments as done.May 4 2023, 5:18 PM
ziqingluo-90 added inline comments.
clang/lib/Analysis/UnsafeBufferUsage.cpp
873–874

This is very valuable comment that points out a big hole in my patch. Now it should be fixed. Thank you.

NoQ added inline comments.May 22 2023, 12:41 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
878–880

Wait, did we decide to emit the #if check at every function after all?

I'd much rather temporarily emit the raw attribute without the check, than temporarily emit an #if, because the raw attribute is desired at least in some cases, but #if next to the function is never desired.

891

Need to test this with various anonymous functions:

959–960

What happens if these are in different FileIDs? Do we end up with a pair of pointers into unrelated buffers? (Eg., macro expansions, or #include in the middle of the function declaration.)

1062

Does this work correctly if the pointee type is an anonymous struct? It probably does (given that anonymous structs can't be used directly, and typedefs aren't skipped here).

I suspect it's more reliable to copy-paste the original spelling of the type from the source code than to rely on pretty-printing, but it could have other problems (eg., there's a typedef that covers the *).

clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-parm-span.cpp
11

Parameter name could be a valuable hint to the user about the purpose of the parameter. The user doesn't necessarily see the implementation, and it's not like we're copying over the comments before the function. So out of the few ways the caller can figure out what to put there, this could be the most important one.

clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-parm-unsupported.cpp
20 ↗(On Diff #515500)

Oh, right, yes, makes sense in both cases!

This doesn't really lead to broken code and it doesn't really sacrifice much const-correctness because const in this position applies only to the parameter's stack storage, which people don't usually overwrite anyway, given that it reliably disappears at end of function. In particular, it's completely immaterial to the caller.

(It could matter when the class does something non-trivial, like have weird global side effects in overloaded assignment operator, so if it's const it could act as a guarantee that these side effects don't happen no matter what the function does; but nobody writes code like this, and std::span specifically is definitely not written like this.)

But I totally agree that in the spirit of preserving maximum similarity to the original code, it's valuable to preserve const here.

NoQ added inline comments.May 22 2023, 12:43 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
1062

We should definitely add a test for this even if it's already working correctly, because people love their .getCanonicalType() and I can easily see somebody adding it to the code later for some unrelated reason. They need to know that typedefs can't be dropped here.

ziqingluo-90 marked 2 inline comments as done.May 22 2023, 2:58 PM
ziqingluo-90 added inline comments.
clang/lib/Analysis/UnsafeBufferUsage.cpp
878–880

oh yes, I should have made the raw attribute the default case here. More advanced cases of dealing with attributes are addressed in this patch.

891

These cases are not supported in this patch. (see the guard at lines 1358--1366).

959–960

This is a good point! I should have just obtained text representation of each part (i.e., return type and function name) separately.

clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-parm-unsupported.cpp
20 ↗(On Diff #515500)

Yes and thanks for the extra note.

Do you think const could also matter for pass-by-reference parameters? e.g.,
void f(int * const &p) -> void f(const std::span<int> &p)

I will add some tests for such cases

ziqingluo-90 marked an inline comment as done.May 23 2023, 3:27 PM
ziqingluo-90 added inline comments.
clang/lib/Analysis/UnsafeBufferUsage.cpp
1062

I hadn't thought of anonymous structs. They can be a problem.
But I think the problem is that we cannot getPointeeType() from a pointer-to an anonymous type.
For example,

typedef struct {int x;} * ANON_PTR;
void foo(ANON_PTR p) {
  ... p[5];
}

One of the generated fix-its looks like this
std::span<struct (unnamed struct at clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-parm-span.cpp:7:9)> q

Actually, getCanonicalType() is not a problem in the cases that we can handle. For example,

typedef struct {int x;} ANON_S;
typedef ANON_S * ANON_PTR;

void foo(ANON_PTR p) {  ... }

getCanonicalType() returns ANON_S for the pointee type of the type of p. If ANON_S is defined by a named struct, getCanonicalType() returns the named struct.

Address comments and made the following changes to the patch:

  1. Stop using SourceManager::getCharacterData to obtain source file texts. Because it is possible that the start and end pointer point to different buffers. Instead, a helper function getRangeText is created.
  2. Stop using TypePrinter. Calling getAsString on a QualType uses the printer under the hood. The printer does not necessarily print a type t in the exact same text form as t appearing in the source file. Especially when macros or typedefs are used. We prefer to generate texts for types without any change to their original literal forms. A helper function getPointeeTypeText is created to generate text for types using TypeLocs to obtain source ranges of types. However, there are a few limitations:
    • TypeLoc does not provide source ranges of qualifiers. (There is a QualifiedTypeLoc that looks like an outlier to the rest of the TypeLoc sub-classes and seems fishy to me). We have to generate qualifier texts using a printer separately.
    • Currently, we cannot obtain the text of a pointee type from a parameter declaration, if the declarator is too complicated. It now only supports two forms of parameter declarations: T identifier or T identifier[], where T is any type.
    • If a pointee type is unnamed or has unnamed sub-types, we cannot guarantee that we can refer to that type. Although there are many cases where we still can refer to such an unnamed type, we give up for now.
ziqingluo-90 edited the summary of this revision. (Show Details)Jun 2 2023, 4:58 PM
NoQ accepted this revision.Jun 5 2023, 5:46 PM

Hi, looks great to me now. It looks like it's at a point where there could be more cornercases for us to tackle, but the rough idea looks correct and we can address the rest of them incrementally. Great job!!

A few microscopic nitpicks but generally I think this is ok to land.

clang/lib/Analysis/UnsafeBufferUsage.cpp
920

I'm a big fan of Present Simple all the way, unless there's a really good reason.

958–959

Slightly more C++-y this way.

clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-parm-span.cpp
47–49
This revision is now accepted and ready to land.Jun 5 2023, 5:46 PM

LGTM.

clang/lib/Analysis/UnsafeBufferUsage.cpp
1053

Minor nit: Changing function name to createOverloadsForFixedParams would need to reflect here as well.

ziqingluo-90 marked 3 inline comments as done.Jun 9 2023, 3:43 PM
This revision was landed with ongoing or failed builds.Jun 9 2023, 3:55 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJun 9 2023, 3:55 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript