This is an archive of the discontinued LLVM Phabricator instance.

[Sema] Allow to diagnose the references to std::vector<T> with incomplete T
Needs ReviewPublic

Authored by ilya-biryukov on Aug 31 2022, 8:20 AM.
This revision needs review, but there are no reviewers specified.

Details

Reviewers
None
Summary

Add a warning -Waccess-vector-incomplete-member.

Inspired by recent changes to libc++ that make some uses actually fail
in C++20 mode, e.g. calls to size() started producing errors in cases
that were fine before.

These accesses are explicitly forbidden by the standard, but end up
working in existing implementations.

The warning is disabled by default, it is intended to be enabled for
finding instances of the code that can potentially break. We intend to
use it to find candidates for cleanup before the C++20 transition.

Diff Detail

Event Timeline

ilya-biryukov created this revision.Aug 31 2022, 8:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2022, 8:20 AM
ilya-biryukov requested review of this revision.Aug 31 2022, 8:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2022, 8:20 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

This warning might be a little verbose and pedantic for Clang, but I still wanted to share the code.
I would be more comfortable doing this a clang-tidy check, but I don't think there is an easy way to write it outside Sema and Sema is not readily available for clang-tidy.

I will try to find ways to patch this in locally for my use-case, but also happy to submit this if others think it is useful.

[...] I don't think there is an easy way to write it outside Sema and Sema is not readily available for clang-tidy.

What information is missing, exactly?

[...] I don't think there is an easy way to write it outside Sema and Sema is not readily available for clang-tidy.

What information is missing, exactly?

Whether a type was incomplete at a certain point in code.

struct IncompleteAtUse;
std::vector<Incomplete> x; // want to catch this, not visible in the resulting AST.
struct IncompleteAtUse {};
shafik added a subscriber: shafik.Aug 31 2022, 8:38 PM

Do you have an idea of how common this might be?

Do you have an idea of how common this might be?

I don't have the numbers yet, but this broke the builds in C++20 mode inside the core libraries we are using after libc++ change that makes vector constexpr.
I suspect every medium-to-large codebase will have a few instances.
One common pattern that I have seen goes something like:

// foo.h
struct Inner;
class Outer {
  std::vector<Inner> elements;
private:
  size_t size() { return elements.size(); }
  void some_other_method();
};

struct Inner {
  Outer* outer;
};

// foo.cpp
Outer::some_other_method() { /* uses members that would actually fail with incomplete type */ }
shafik added a comment.Sep 1 2022, 8:32 PM

Do you have an idea of how common this might be?

I don't have the numbers yet, but this broke the builds in C++20 mode inside the core libraries we are using after libc++ change that makes vector constexpr.
I suspect every medium-to-large codebase will have a few instances.
One common pattern that I have seen goes something like:

// foo.h
struct Inner;
class Outer {
  std::vector<Inner> elements;
private:
  size_t size() { return elements.size(); }
  void some_other_method();
};

struct Inner {
  Outer* outer;
};

// foo.cpp
Outer::some_other_method() { /* uses members that would actually fail with incomplete type */ }

This old cfe-dev thread might be relevant: https://lists.llvm.org/pipermail/cfe-dev/2018-June/058338.html

It appears that folks take advantage of this working a lot. It sounds like your saying w/ constexpr std::vector this would be ill-formed in a constant expression context now. Which won't effect most of the existing code I think?

any chance of generalizing this beyond a special case for std::vector - an attribute we could add to any class template (perhaps particular template parameters to the class template) to document the requirement? (is std::vector the only type in the standard library that has this property, or only the one people trip over most often)

This old cfe-dev thread might be relevant: https://lists.llvm.org/pipermail/cfe-dev/2018-June/058338.html

Thanks, the thread is definitely relevant.

It appears that folks take advantage of this working a lot. It sounds like your saying w/ constexpr std::vector this would be ill-formed in a constant expression context now. Which won't effect most of the existing code I think?

Most of the code is not affected, but some code is and large codebases will definitely see quite some breakages.
Note that this warning diagnoses is very pedantic in terms of implementing the rules from the standard. It fires in many places than don't actually break in libc++ and C++20 mode. On the flip side, it is really easy to implement.

any chance of generalizing this beyond a special case for std::vector - an attribute we could add to any class template (perhaps particular template parameters to the class template) to document the requirement? (is std::vector the only type in the standard library that has this property, or only the one people trip over most often)

It should be easy to generalize, e.g. we could introduce an attribute to mark classes that want this behavior, something along the lines of:

// Option 1. Less general, easy, small overhead.
template <class T> struct [[member_access_requires_complete(T)]] my_array {};
// Option 2. More general, might be high-overhead depending on the actual expression.
template <class T> struct [[member_access_requires(sizeof (T))]] my_array {}; // checks for validity of expression on every member access?

I am not sure this carries it weight for that particular use-case and there are other things to consider, e.g. how to report error messages.
What is definitely useful and cheap is adding these warnings for more STL containers that have this requirement.

We discussed this with @aaron.ballman in the last Clang C++ Language WG meeting and one idea that popped up is implementing a warning in Clang, but never actually surfacing it in the compiler, only in clang-tidy. I would be very comfortable making a clang-tidy warning like this as the bar is much lower than the compiler warnings.
My proposal would be to allow making warnings that are disabled by default and even with -Weverything, the only way to enable them would be to pass -W<warning-group> explicitly or run a corresponding clang-tidy check. Alternatively, there will be no way to enable the warning in the compiler at all and the clang-tidy would be the only way to surface it.

Obviously, these use-cases should be rare as most things can be implemented directly in clang-tidy, but in that particular case we need access to the compiler state in the middle of processing the TU, something that clang-tidy APIs do not provide. We should also be careful to make sure any warnings like this do not hurt performance of the compiler.

@aaron.ballman does that look reasonable? I am about to try prototyping this. Are there any open questions we might want to agree on beforehand?

CC @erichkeane for attribute questions, and CC @ldionne for STL questions and design/usability of the proposed attribute

This old cfe-dev thread might be relevant: https://lists.llvm.org/pipermail/cfe-dev/2018-June/058338.html

Thanks, the thread is definitely relevant.

It appears that folks take advantage of this working a lot. It sounds like your saying w/ constexpr std::vector this would be ill-formed in a constant expression context now. Which won't effect most of the existing code I think?

Most of the code is not affected, but some code is and large codebases will definitely see quite some breakages.
Note that this warning diagnoses is very pedantic in terms of implementing the rules from the standard. It fires in many places than don't actually break in libc++ and C++20 mode. On the flip side, it is really easy to implement.

I assume the standard rules you're talking about are: http://eel.is/c++draft/containers#vector.overview-4 (and other instances of similar wording for other containers)? If so, shouldn't this requirement be addressed by the STL implementation and not the compiler?

any chance of generalizing this beyond a special case for std::vector - an attribute we could add to any class template (perhaps particular template parameters to the class template) to document the requirement? (is std::vector the only type in the standard library that has this property, or only the one people trip over most often)

It should be easy to generalize, e.g. we could introduce an attribute to mark classes that want this behavior, something along the lines of:

// Option 1. Less general, easy, small overhead.
template <class T> struct [[member_access_requires_complete(T)]] my_array {};
// Option 2. More general, might be high-overhead depending on the actual expression.
template <class T> struct [[member_access_requires(sizeof (T))]] my_array {}; // checks for validity of expression on every member access?

I am not sure this carries it weight for that particular use-case and there are other things to consider, e.g. how to report error messages.
What is definitely useful and cheap is adding these warnings for more STL containers that have this requirement.

We discussed this with @aaron.ballman in the last Clang C++ Language WG meeting and one idea that popped up is implementing a warning in Clang, but never actually surfacing it in the compiler, only in clang-tidy. I would be very comfortable making a clang-tidy warning like this as the bar is much lower than the compiler warnings.
My proposal would be to allow making warnings that are disabled by default and even with -Weverything, the only way to enable them would be to pass -W<warning-group> explicitly or run a corresponding clang-tidy check. Alternatively, there will be no way to enable the warning in the compiler at all and the clang-tidy would be the only way to surface it.

Obviously, these use-cases should be rare as most things can be implemented directly in clang-tidy, but in that particular case we need access to the compiler state in the middle of processing the TU, something that clang-tidy APIs do not provide. We should also be careful to make sure any warnings like this do not hurt performance of the compiler.

@aaron.ballman does that look reasonable? I am about to try prototyping this. Are there any open questions we might want to agree on beforehand?

An attribute is an interesting idea, but I'm not certain it's fully necessary; I think it's effectively a slightly nicer form of: https://godbolt.org/z/ha6Tb1s4M

The reason why I'm wondering whether libc++ should handle this is: the requirements are library requirements, not compiler requirements, so I think the library should decide whether to enforce this rule or not. An attribute is an interesting idea in so much as it may help the library authors to implement this more easily, but the downside to the attribute is that it won't help libc++ out too much because they still need to work with other compilers that don't support the attribute.

Just my .02, but I am conflicted between:

  1. Simply not doing anything -- the diagnostic users get when they violate the requirement currently is probably not that bad? I did see this breakage a bit in our internal code base as well, but it was easy to fix and there were not many instances.
  2. Adding the attribute that was suggested and using it in libc++. On compilers that don't support the attribute, we'd simply be less pedantic.

The one thing I'd rather not do is static_assert(__is_complete<_Tp>::value) in all the std::vector member functions, IMO that adds complexity and reduces readability for a really marginal gain.

Just my .02, but I am conflicted between:

  1. Simply not doing anything -- the diagnostic users get when they violate the requirement currently is probably not that bad? I did see this breakage a bit in our internal code base as well, but it was easy to fix and there were not many instances.

I think the issue this is solving is that we are not diagnosing the library requirement properly because we would miss cases where the incomplete type wasn't actually used by the member function (aka, it wasn't caught by any constant expression evaluation requirements in the compiler).

  1. Adding the attribute that was suggested and using it in libc++. On compilers that don't support the attribute, we'd simply be less pedantic.

The one thing I'd rather not do is static_assert(__is_complete<_Tp>::value) in all the std::vector member functions, IMO that adds complexity and reduces readability for a really marginal gain.

Because this is a libc++ requirement, I will defer to what you think is best here. The only danger I see of not diagnosing this is that users may potentially run into a portability issue to another STL, which might make sense as a clang-tidy warning, but seems like it'd be pretty low-value as I suspect most STL implementations rely on the compiler to issue a diagnostic for the incomplete type when it matters and so all the implementations are likely to behave roughly the same in terms of behavior. So I'm leaning towards the "do nothing" option.

I'm not certain an attribute makes a whole lot of sense given that the requirement is pretty specific to the STL.