This is an archive of the discontinued LLVM Phabricator instance.

[clang] perform semantic checking in constant context
ClosedPublic

Authored by Tyker on May 16 2019, 7:23 AM.

Details

Summary

Since the addition of __builtin_is_constant_evaluated the result of an expression can change based on whether it is evaluated in constant context. a lot of semantic checking performs evaluations with out specifying context. which can lead to wrong diagnostics.
for example:

constexpr int i0 = (long long)__builtin_is_constant_evaluated() * (1ll << 33); //#1
constexpr int i1 = (long long)!__builtin_is_constant_evaluated() * (1ll << 33); //#2

before the patch, #2 was diagnosed incorrectly and #1 wasn't diagnosed.
after the patch #1 is diagnosed as it should and #2 isn't.

Changes:

  • add a flag to some Expr::Evaluate* operation to evaluate in constant context.
  • in SemaChecking.cpp calls to Expr::Evaluate* are now done in constant context when they should.
  • in SemaChecking.cpp diagnostics for UB are not checked for in constant context because an error will be emitted by the constant evaluator.
  • in SemaChecking.cpp diagnostics for construct that cannot appear in constant context are not checked for in constant context.
  • in SemaChecking.cpp diagnostics on constant expression are always emitted because constant expression are always evaluated.
  • semantic checking for initialization of constexpr variables is now done in constant context.
  • adapt test that were depending on warning changes.
  • add test.

Diff Detail

Repository
rL LLVM

Event Timeline

Tyker created this revision.May 16 2019, 7:23 AM
Tyker added a reviewer: rsmith.
Tyker set the repository for this revision to rC Clang.
Herald added a project: Restricted Project. · View Herald TranscriptMay 16 2019, 7:23 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
Tyker edited the summary of this revision. (Show Details)May 16 2019, 7:24 AM
Tyker updated this revision to Diff 199830.May 16 2019, 7:57 AM

The approach we've been taking for this up until now is to use a ConstantExpr node to mark the entry point of a constant context; it looks like that would continue to work here, but we're missing those nodes in some cases? (This is preferable to using a flag because it also -- eventually -- gives us a place to store the evaluated value, which we're going to need for various upcoming features, particularly consteval.)

Tyker added a comment.EditedMay 16 2019, 12:16 PM

i added this bit because when we eventually store the result of evaluation in ConstantExpr we will probably want to trail-allocate the APValue's possible representations separately to limit memory consumption. but if we do this the ConstantExpr node will need to be created after evaluation of the value to allocate the right space and we will need an other way to mark that the expression should be evaluated in constant context.
if we won't store APValue's possible representations separately in trail-allocate space, wrapping the expression in a ConstantExpr is the right solution.

i added this bit because when we eventually store the result of evaluation in ConstantExpr we will probably want to trail-allocate the APValue's possible representations separately to limit memory consumption. but if we do this the ConstantExpr node will need to be created after evaluation of the value to allocate the right space and we will need an other way to mark that the expression should be evaluated in constant context.

Even if we want to tail-allocate (which does seem like a nice idea in the long term), we don't need to mark the expression for that: we can pass a flag into the constant evaluator to indicate that we're evaluating an expression with the intent of forming a ConstantExpr. (We already do that for EvaluateAsRValue; we'll need to do the same for other entry points that could be evaluating at the top level in a constant context.)

Tyker added a comment.EditedMay 17 2019, 4:47 AM

I checked how we can propagate the information about constant context through semantic checking.
there are issues with using ConstantExpr to mark expressions as constant for semantic checking:

  • #1 multpile Expr::Ignore* operation remove ConstantExpr from the expression.
  • #2 Semantic checking Traverses the AST so all methods that only mark the top-level Node will not completely work.
  • #3 at short term: adding ConstantExpr modifies the AST structure, so adding it everywhere it is needed for semantic checking will require changing code that depend closely on the AST structure.

Note: the limitation #2 also applies to the bit in ExprBitfield solution in its current form.

propagating constant context to the expression evaluator via a boolean value will be a lot of boilerplate and in my opinion this should be propagated more "automatically".
so I think the best solutions are:

  • push a ExpressionEvaluationContext::ConstantEvaluated so Sema will propagate it and propagate from Sema to the expression evaluator via boolean values.
  • put all semantic checking function's in a SemaCheckContext class and propagate via this class to the expression evaluator. this class will be usable to propagate Sema and probably other informations.
  • keep the bit in ExprBitfield but make all nodes created in ExpressionEvaluationContext::ConstantEvaluated marked for constant evaluation to fixes limitation #2.
Tyker updated this revision to Diff 200231.May 20 2019, 3:28 AM
Tyker edited the summary of this revision. (Show Details)
Tyker edited the summary of this revision. (Show Details)
Tyker edited the summary of this revision. (Show Details)

there are issues with using ConstantExpr to mark expressions as constant for semantic checking:

  • #1 multpile Expr::Ignore* operation remove ConstantExpr from the expression.

Ignore* is generally used when walking the syntactic form of an expression, so this is usually appropriate.

  • #2 Semantic checking Traverses the AST so all methods that only mark the top-level Node will not completely work.

Semantic checks that care about constant evaluation should not walk over a ConstantExpr node. Generally, we want rather different checking inside a ConstantExpr node than outside (for example, we shouldn't be performing any checks for potential overflow or narrowing or conversions, and instead should be finding out what *actually* happens by evaluating the expression and diagnosing problems in it). So this seems fine too, though there may still be places that need updates to properly handle ConstantExpr.

  • #3 at short term: adding ConstantExpr modifies the AST structure, so adding it everywhere it is needed for semantic checking will require changing code that depend closely on the AST structure.

Yes. But that's going to be the case for any approach we take here.

  • push a ExpressionEvaluationContext::ConstantEvaluated so Sema will propagate it and propagate from Sema to the expression evaluator via boolean values.
  • put all semantic checking function's in a SemaCheckContext class and propagate via this class to the expression evaluator. this class will be usable to propagate Sema and probably other informations.
  • keep the bit in ExprBitfield but make all nodes created in ExpressionEvaluationContext::ConstantEvaluated marked for constant evaluation to fixes limitation #2.

We don't always know whether an expression is constant evaluated until after we've parsed it (eg, for a call to a function that might be consteval we need to perform overload resolution before we find out). And this would require changing all the (many) places that create Expr nodes to pass in new information. That sounds unpleasant and expensive, and adds a permanent cost to all future AST changes.

So far we don't appear to need any per-Expr indicator of whether that expression is transitively within a ConstantExpr, so let's not add that.

Tyker added a comment.EditedMay 20 2019, 2:35 PM

I changed the approach in the last revision. the information is now propagated using the expression evaluation context and then via booleans.

Interesting. I think all of the new warnings in the test cases here are undesirable (they duplicate errors produced by the constant evaluator), but the removed warnings all look like improvements.

Generally, I think our goals should be:

For code patterns that might lead to undefined behavior when executed with certain values:

  • If the code appears in a constant context (isConstantEvaluated() or a subexpression of a ConstantExpr), we should diagnose the problem from the constant evaluator if it actually happens and otherwise not diagnose it.
  • Otherwise, we should warn on the problem if the code is reachable and otherwise not diagnose it (that is, use DiagRuntimeBehavior).

For code patterns that are suspicious but defined (eg, we think they might do something other than what the programmer intended), we should typically diagnose using Diag, regardless of whether they appear in a constant context.

DiagRuntimeBehavior already deals with much of this for us: if called in a ConstantEvaluated context, it suppresses diagnostics.

One way we could proceed would be to extend what you did for CheckCompletedExpr to the cases where SemaChecking.cpp (and friends) recurse into a ConstantExpr: temporarily create a ConstantEvaluated expression evaluation context for the purpose of the checks, so that we can track that we're within such a construct. But an ExpressionEvaluationContextRecord is not a trivial thing to create and pass around, and carries additional semantic implications on top of the context kind, so perhaps just adding a flag to it to indicate that we're performing semantic analysis inside a ConstantExpr within the context would be better.

clang/lib/Sema/SemaChecking.cpp
4066–4073 ↗(On Diff #200231)

For __attribute__((nonnull)), I think the right thing to do is to change the constant expression evaluator to consider passing a null pointer argument for a nonnull-attributed parameter to be non-constant. (It's undefined behavior, so we should not permit it in constant evaluations.) For _Nonnull, I think the situation is less clear -- that case is explicitly not undefined behavior, so it's not clear that treating it as non-constant is the right choice. But as with the other cases here, isConstantEvaluated() does not imply the code is necessarily reachable, so bypassing DiagRuntimeBehavior's check for reachability doesn't seem like the appropriate behavior.

6089 ↗(On Diff #200231)

isConstantEvaluated() doesn't imply that the expression is necessarily reachable, so always giving an error here doesn't seem right. Instead, we should (already) have the relevant range checks in the constant evaluator for such builtins that are usable in constant expressions, and should never need to check here (if the code is reached during a constant evaluation, then the evaluation will be non-constant and hence ill-formed).

I think we should just return early from this function if isConstantEvaluated() like you do in checkFortifiedBuiltinMemoryFunction.

6612–6613 ↗(On Diff #200231)

If we're in a constant-evaluated context, I think we can and should simply skip all format string checking. There are three cases here:

  1. The call with the format attribute is unreached: no diagnostic should be issued.
  2. The call with the format attribute is reached but not constant-evaluatable: an error will be produced that we couldn't constant-evaluate.
  3. The call with the format attribute is reached and is constant-evaluatable: the constant evaluator needs to check that the format string matches the arguments anyway (because in general we permit non-constant format strings), so it will need to properly handle this case.

Case 3 is currently theoretical (we don't constant-evaluate any format-string functions). But in any case, I think the conclusion remains: we don't need to do these checks in a constant-evaluated context.

Tyker updated this revision to Diff 200662.EditedMay 22 2019, 2:17 AM

most comments were fixed. but there is a few point on which the direction isn't clear.

I think all of the new warnings in the test cases here are undesirable (they duplicate errors produced by the constant evaluator)

in the last revision warn_impcast_integer_precision_constant was always diagnosed in constant expression and diagnosed if reachable otherwise. this warning was responsible of all new warnings in test cases but isn't diagnosed by the constant evaluator. which is correct because integral casts have implementation defined or well defined behavior, not undefined behavior.

For code patterns that are suspicious but defined (eg, we think they might do something other than what the programmer intended), we should typically diagnose using Diag, regardless of whether they appear in a constant context.

this is not currently done even in non constant context. for example warn_impcast_integer_precision_constant is diagnosed via DiagRuntimeBehavior.

in this revision

constexpr int i0 = (long long)__builtin_is_constant_evaluated() * (1ll << 33); //#1

is diagnosed as it should but the reason it is diagnosed isn't correct. it is diagnosed only because CheckCompletedExpr doesn't push an ExpressionEvaluationContextRecord but just set isConstantEvaluatedOverride. so DiagRuntimeBehavior doesn't see the context as constant context. This means that other contexts that are correctly in constant context like:

case (long long)__builtin_is_constant_evaluated() * (1ll << 33):; //#1

will not be diagnosed even though they probably should.

Tyker marked 2 inline comments as done.May 22 2019, 2:18 AM

@rsmith friendly ping.

rsmith accepted this revision.Jun 7 2019, 11:20 AM

Thanks, looks great.

clang/lib/Sema/SemaChecking.cpp
13941 ↗(On Diff #200662)

permormed -> performed

This revision is now accepted and ready to land.Jun 7 2019, 11:20 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2019, 1:30 AM