Details
- Reviewers
aaron.ballman - Group Reviewers
Restricted Project - Commits
- rG47ccfd7a89e2: [Clang] Implement P2741R3 - user-generated static_assert messages
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/lib/AST/ExprConstant.cpp | ||
---|---|---|
16387 | Any reason the Expr* pointers here can't be const? |
clang/lib/AST/ExprConstant.cpp | ||
---|---|---|
16387 | Nope!. Note that I'm not entirely sold on this interface, having a function in Expr taking a bunch of function pointers does not feel amazing, |
Rebase.
This is now ready for review.
Note that after discussion with CWG, the consensus seems to be that the wording
is fine, an implementation has to behave
As if the full expression Message.data()[I] is called for each character.
Obviously this is not a viable implementation strategy so we materialize
intermediate expressions.
clang/include/clang/Sema/Sema.h | ||
---|---|---|
3883 | Appears unused. | |
clang/lib/AST/ExprConstant.cpp | ||
16413 | This relies on host's CHAR_BIT >= target's CHAR_BIT, which isn't true for my target. Could you add an assertion? | |
clang/lib/Sema/SemaDeclCXX.cpp | ||
16960 | ||
clang/test/SemaCXX/static-assert-cxx26.cpp | ||
127 | Should there be (negative?) tests with non-const data/size members and incorrect number of parameters? These conditions are checked in FindMember. |
According to the current wording, the static_assert-message is either unevaluated string or an expression evaluated at compile time.
Unevaluated strings don't allow certain escape sequences, but if I wrap the string in a string_view-like class, I'm allowed to use any escape sequeces, including '\x'.
Moreover, wrapping a string in a class would change its encoding. Unevaluated strings are displayed as written in the source (that is, UTF-8), while wrapped strings undergo conversion to execution encoding (e.g. EBCDIC) and then printed in system locale, leading to mojibake.
Not quite.
Unevaluated strings are always UTF-8 ( regardless of source file encoding). Evaluated strings are in the literal encoding which is always UTF-8 for clang.
This will change whenever we allow for different kinds of literal encodings per this RFC https://discourse.llvm.org/t/rfc-enabling-fexec-charset-support-to-llvm-and-clang-reposting/71512/1
If and when that is the case we will have to convert back to UTF-8 before displaying - and then maybe convert back to the system locale depending on host.
Numeric escape sequences can then occur in evaluated strings and produce mojibake if the evaluated strings is not valid in the string literal encoding.
I don't believe that we would want to output static messages without conversion on any system as the diagnostics framework is very much geared towards UTF-8 and we want to keep supporting cross compilation.
So the process will be
source -> utf8 -> literal encoding -> utf8 -> terminal encoding.
By the same account, casting 0-extended utf-8 to char is fine until such time clang support more than UTF-8. (which is one of the reasons we need to make sure clang conversions utilities can convert from and to utf-8)
Unevaluated strings were introduced in part to help identify what gets converted and what does not.
Thanks for your reply, I think I see the idea.
By the same account, casting 0-extended utf-8 to char is fine until such time clang support more than UTF-8. (which is one of the reasons we need to make sure clang conversions utilities can convert from and to utf-8)
Unevaluated strings were introduced in part to help identify what gets converted and what does not.
It is a bit strange that the string in static_assert(false, "й") is not converted, while it is converted in static_assert(false, std::string_view("й")).
It might be possible to achieve identical diagnostic output even with -fexec-charset supported (which would only affect the second form),
but right now I'm confused by the distinction… Why don't always evaluate the message?
but right now I'm confused by the distinction… Why don't always evaluate the message?
2 reasons
- it would be a rather important breaking change for compiler who don't always use utf-8 at their literal encoding
- we do not want to limit static_assert to the capabilities of the literal encoding.
That oddity (there is no denying it's odd), was discussed by the C++ committee and we decided this was fine.
However we did also consider supporting char8_t* in data, which would allow users to use evaluated utf-8
strings.
We did decide against doing it right now but i suspect it will happen at some point.
Address Sergei's feedback
- Add more tests
- Support non const member functions
- Make sure diagnostics messages are never produced twice
- Support returning intermediate objects from data()/size()
(Not a full review, I ran out of steam -- I wanted to get you some feedback that I already found though.)
clang/include/clang/AST/Expr.h | ||
---|---|---|
766–767 | The function name confuses me a bit because a char pointer doesn't have a size expression. I was thinking "EvaluateCharArrayAsString" but that's also not right. | |
clang/include/clang/Basic/DiagnosticSemaKinds.td | ||
86 | This should also be re-flowed to 80 columns. | |
1545–1546 | And re-flow to 80 columns. You should make similar changes in the other new diagnostics as well. (Use single quotes around syntax elements, start the string on its own line rather inline with the def, use "static assertion" instead of "static_assert" (The last point is largely because C has both _Static_assert and static_assert and we're avoiding figuring out which one was used. It may be moot as this is C++-only functionality, but it is more consistent with other static assert diagnostics.) | |
clang/lib/AST/ExprConstant.cpp | ||
16413 | Wouldn't adding the assertion cause you problems then? (FWIW, we only support CHAR_BIT == 8 currently.) | |
16417–16418 | Rather than use an RAII object and destroy it manually, let's use {} to scope the RAII object appropriately. | |
clang/lib/Sema/SemaDeclCXX.cpp | ||
16922–16927 | It might be more friendly to tell the user which was missing, size or data. |
clang/test/SemaCXX/static-assert-cxx26.cpp | ||
---|---|---|
168 | I feel like this diagnostic could be more detailed. Looking at the implementation you could at least point out which one is invalid if not both and you could probably add more details on top of that but I think at least pointing out which is invalid would be helpful. |
clang/test/SemaCXX/static-assert-cxx26.cpp | ||
---|---|---|
2 | How about a test case w/ data() and or size() has default arguments. |
clang/lib/AST/ExprConstant.cpp | ||
---|---|---|
16413 | It will, the assertion will help find this place. |
- Rename EvaluateCharPointerAsString => EvaluateCharRangeAsString
- Improve diag messages
- Form an overload set to diagnose invalid overloads
- Add more tests
clang/lib/AST/ExprConstant.cpp | ||
---|---|---|
16413 | I replied to that in one of the comment i made on the review, but CHAR_BIT is not relevant there. That being said I realized while replying that there is a bug there: it should cast to unsigned. | |
16417–16418 | This seems to be the way to use this interface - mostly because destroy can fail and we want to detect that |
clang/include/clang/Basic/DiagnosticSemaKinds.td | ||
---|---|---|
1553–1558 | How about we combine these into one diagnostic given how similar and related they are? | |
clang/lib/AST/ExprConstant.cpp | ||
16417–16418 | Oh, what a strange thing to call RAII! :-D | |
clang/lib/Sema/SemaDeclCXX.cpp | ||
16888–16899 | ||
16902 | ||
16942–16943 | ||
16960 | Can be re-flowed to 80 columns, and I'll stop pointing out changes for this one -- just be sure to replace everything with Loc that you can. | |
16999–17000 | Errrr, this seems like a pretty big surprise because evaluating that char range can fail. This means: constexpr struct { size_t size() const; const char *data() const; } d; static_assert(false, d); // Fails, not because of the static assertion but because d is not valid static_assert(true, d); // Succeeds, despite d not being valid (Regardless of what we land on for a direction, I think this is a good test case to add.) I don't know that this is reasonable QoI. Most static assertions pass rather than fail, so I think this will hide bugs from users in unexpected ways. I understand there may be concerns about compile time overhead, but those concerns seem misplaced in the absence of evidence: the extra overhead is something the user is opting into explicitly by using constexpr features and constexpr evaluation is well known to slow down compile times. Further, given that many (most?) static assert messages are relatively short (e.g., not kilobytes or megabytes of text), that extra overhead is likely negligible to begin with in this case, at least on an individual assert basis. The downside is, because most static assertions pass, that evaluation is also unnecessary in most cases, and it could add up if there's a lot of static assertions. However, given that users add static assertions to tell them when code is incorrect (assumptions invalidated, etc), it seems pretty counter-productive to hide bugs within the static assertion itself. What do others think? @hubert.reinterpretcast @tahonermann @erichkeane @ldionne @shafik If there's some agreement that we'd like to see diagnostics here, my initial preference is for an error diagnostic because a static assertion is only useful if both operands are valid constant expressions, but I could probably live with a warning. However, I think that an error would require filing a DR with WG21 otherwise there are conformance issues. In the meantime, to keep this review moving without waiting for committee deliberations, I think we can evaluate the constant expression to determine it's validity and report issues as a warning (potentially one that defaults to an error) if others agree that a diagnostic is warranted here. | |
clang/test/SemaCXX/static-assert-cxx26.cpp | ||
89–99 | Use of true here means that we missed the fact that MessageFromConvertible() is invalid; not only are the size and data methods not constexpr, the ConvertibleToInt and ConvertibleToCharPtr classes do not have constexpr conversion operators. That could be intentional, but we should definitely have a false test that tests the conversion operators are properly called and work as expected when they're correct. | |
111–112 | This diagnostic combination is a little bit jarring to me. The first one is telling the user that the message is incorrect and the second one tells the user precisely what that message was, so they're almost contradictory in some ways. However, I'm not certain if others think this is jarring as well. If we think that, perhaps this sort of circumstance should just say the static assertion failed without the message part, even though we could figure out something as the message in this particular instance. |
clang/test/SemaCXX/static-assert-cxx26.cpp | ||
---|---|---|
111–112 | I'm mostly neutral on other comment but i disagree with that. because data() is a pointer there are a few cases where we can form a diagnostic despite errors: Other cases include recoverable errors from evaluating the message, data or size(), for example https://godbolt.org/z/aj4aKMMqn Note that all of these were discussed at length in WG21, and well, whether a message is produced or not is QOI, so whatever we do there is conforming, i just think we should provide as useful messages as we can. |
clang/test/SemaCXX/static-assert-cxx26.cpp | ||
---|---|---|
111–112 | Okay, I can see the logic there. My concern was mostly around whether we can display the message in all cases or not. e.g., can the constexpr evaluation fail such that the data returned is effectively garbage and thus we display gibberish to the user? e.g., https://godbolt.org/z/Wx74as4Gh (This one doesn't show gibberish, but... we didn't exactly find a message either.) |
clang/test/SemaCXX/static-assert-cxx26.cpp | ||
---|---|---|
241 | Another test case to consider adding: consteval const char *oops() { const char *ptr; // Garbage return ptr; } static_assert(false, oops()); // This should get the static assertion failure and the invalid message failure (I think this holds because "M.data(), implicitly converted to the type ”pointer to const char”, shall be a core constant expression and let D denote the converted expression," and http://eel.is/c++draft/expr.const#5.8 means this wouldn't qualify as a core constant expression, I believe. At least, that's how non-static assert code behaves: https://godbolt.org/z/aKrfhr9Pj) |
@aaron.ballman @shafik Thanks! I'll address the small nitpicks one we all agree on a direction on whether to have a warning/error/nothing on static_assert(true, non_constant_expression)
clang/lib/Sema/SemaDeclCXX.cpp | ||
---|---|---|
16999–17000 |
It might be true that they usually pass, but as assertions, their purpose is to eventually fail. Just like the runtime assertions in clang usually pass but also fail all the time (see the bug tracker). Once they fail, it would be pretty disappointing if producing the error message itself would produce yet another error message... | |
17009–17010 |
clang/lib/Sema/SemaDeclCXX.cpp | ||
---|---|---|
16999–17000 |
That's a good way of restating why I don't think this is good QoI. |
clang/lib/Sema/SemaDeclCXX.cpp | ||
---|---|---|
17032 | Should work, shouldn't it? |
- Display a warning, defaulted as error, if the message cannot be evaluated in
a static assertion that does not fail
- Address Shafik's, Timm's and Aaron's feedbacks
This is looking pretty close to good, but there were some outstanding questions still.
clang/include/clang/Basic/DiagnosticSemaKinds.td | ||
---|---|---|
86 | It looks like these edits were missed. | |
clang/lib/AST/DeclPrinter.cpp | ||
952–954 | Should there be an else clause here that does D->getMessage()->prettyPrint() so that we print out *something*? Otherwise, there should be a fixme comment about not printing out non-string literal operands. | |
clang/lib/AST/ExprConstant.cpp | ||
16408 | I think the caller already validates that we have a pointer to the correct kind of string, so I kind of think the isInt() check is unnecessary. If it's not, can you show a test case where it kicks in? | |
clang/test/SemaCXX/static-assert-cxx26.cpp | ||
270 | One more test case to consider: do we properly handle dependent expressions in the assert itself? e.g., template <typename Ty> struct S { static_assert(false, Ty{}); }; struct Frobble { constexpr int size() const { return 5; } consterxpr const char *data() const { return "hello"; } }; S<Frobble> s; // Good, fails with "hello" S<int> t; // Incorrect |
- Add test for dependent message
- always dump the message even if it's not a string literal
- remove superflous isInt
- fix diag message
clang/lib/AST/ExprConstant.cpp | ||
---|---|---|
16408 | I was thinking of indeterminate values but i can't find a way to get into that situation, so i think this is superfluous indeed |
clang/test/SemaCXX/static-assert-cxx26.cpp | ||
---|---|---|
270 | note that i did split the test in two to better show the appertainance of each diag. |
The function name confuses me a bit because a char pointer doesn't have a size expression. I was thinking "EvaluateCharArrayAsString" but that's also not right.