This is an archive of the discontinued LLVM Phabricator instance.

[Sema] Mark implicitly-inserted ICE's as being part of explicit cast (PR38166)
ClosedPublic

Authored by lebedev.ri on Jul 18 2018, 1:28 PM.

Details

Summary

As discussed in PR38166, we need to be able to distinqush whether the cast
we are visiting is actually a cast, or part of an ExplicitCast.
There are at least four ways to get there:

  1. Introduce a new CastKind, and use it instead of IntegralCast if we are in ExplicitCast.

    Would work, but does not scale - what if we will need more of these cast kinds?
  2. Introduce a flag in CastExprBits, whether this cast is part of ExplicitCast or not.

    Would work, but it isn't immediately clear where it needs to be set.
  3. Fix ScalarExprEmitter::VisitCastExpr() to visit these NoOp casts.

    As pointed out by @rsmith, CodeGenFunction::EmitMaterializeTemporaryExpr calls

    skipRValueSubobjectAdjustments, which steps over the CK_NoOp cast`,

    which explains why we currently don't visit those.

    This is probably impossible, as @efriedma points out, that is intentional as per [class.temporary] in the standard
  4. And the simplest one, just record which NoOp casts we skip.

    It just kinda works as-is afterwards.

But, the approach with a flag is the least intrusive one, and is probably the best one overall.

Diff Detail

Repository
rC Clang

Event Timeline

lebedev.ri created this revision.Jul 18 2018, 1:28 PM

skipRValueSubobjectAdjustments has to match the rules in [class.temporary] in the standard, which includes skipping over certain explicit casts.

Would it be enough to accumulate the skipped casts into a SmallVector, like we do for the skipped comma operators?

test/CodeGenCXX/const-init-cxx11.cpp
229 โ†—(On Diff #156137)

This change is suspect; note the comment "this creates a non-const temporary".

I think it would be reasonable to set a flag on ImplicitCastExprs that are actually semantically part of an explicit cast. I don't think that would be hard to get Sema to do, either by passing a flag down to the code that builds those casts or just by retroactively setting that flag on all the ICE sub-expressions of an explicit cast when "capping" it with the ExplicitCastExpr.

skipRValueSubobjectAdjustments has to match the rules in [class.temporary] in the standard, which includes skipping over certain explicit casts.

I'm this approach because this is what @rsmith suggested. As i said, i don't really know this code,
so maybe i'm missing something obvious that makes this still possible.

Would it be enough to accumulate the skipped casts into a SmallVector, like we do for the skipped comma operators?

Hmm, I'm not sure. It depends whether we return from CodeGenFunction::EmitMaterializeTemporaryExpr(),
or are still within the CodeGenFunction::EmitMaterializeTemporaryExpr() function, when visiting nested nodes.
This might work, but let's see if i can make the flag approach work.

I think it would be reasonable to set a flag on ImplicitCastExprs that are actually semantically part of an explicit cast. I don't think that would be hard to get Sema to do, either by passing a flag down to the code that builds those casts or just by retroactively setting that flag on all the ICE sub-expressions of an explicit cast when "capping" it with the ExplicitCastExpr.

That was my initial thought, too.
I wasn't sure how to approach this though.

lebedev.ri retitled this revision from [CodeGen] VisitMaterializeTemporaryExpr(): don't skip NoOp Casts. to [Sema] Expr::skipRValueSubobjectAdjustments(): record skipped NoOp casts..
lebedev.ri edited the summary of this revision. (Show Details)

Would it be enough to accumulate the skipped casts into a SmallVector, like we do for the skipped comma operators?

Duuuh, why did i not try this in the first place?!
That works, thanks!

Hmm. I think the approach of flagging ICEs that are semantically part of an explicit cast is probably a better representation for tools across the board.

If we *are* going to do it this way, though, I think you should (1) make the collection of skipped expressions optional and (2) collect *all* the skipped expressions, not just no-op casts.

Hmm. I think the approach of flagging ICEs that are semantically part of an explicit cast is probably a better representation for tools across the board.

I could do that, but i couldn't find where it should be done.
Where are these "ICEs that are semantically part of an explicit cast" created? Where would we mark them?

If we *are* going to do it this way, though, I think you should (1) make the collection of skipped expressions optional and (2) collect *all* the skipped expressions, not just no-op casts.

  1. I was wondering about that, will do.
  2. Well, i could do that, but i would need to filter them afterwards in my use-case. So i wonder - what is the motivation for that? Nothing else needs that additional knowledge right now.

Hmm. I think the approach of flagging ICEs that are semantically part of an explicit cast is probably a better representation for tools across the board.

I could do that, but i couldn't find where it should be done.
Where are these "ICEs that are semantically part of an explicit cast" created?

The design of CastExpr is that each node does exactly one implicit conversion step, but sometimes a conversion involves performing multiple steps which each has to get its own node, so all those initial conversions have to be represented as ICEs. Also, while we do make some effort to give the explicit cast node a meaningful CastKind, in some situations it's just easier to build an ICE with the real CK and then just make the explicit cast a NoOp. All those ICEs are triggered by the checking of the explicit cast and therefore semantically part of that operation even if they're separate nodes for technical reasons.

Where would we mark them?

Well, explicit casts are handled by SemaCast.cpp, and all the checking routines are managed by a CastOperation helper class. In fact, it's set up so that you could very easily (1) stash the original expression in the CastOperation construct and then (2) in complete(), just walk from the given cast up to the original expression and mark every ICE you see as being part of the explicit cast.

If we *are* going to do it this way, though, I think you should (1) make the collection of skipped expressions optional and (2) collect *all* the skipped expressions, not just no-op casts.

  1. I was wondering about that, will do.
  2. Well, i could do that, but i would need to filter them afterwards in my use-case.

Well, you're filtering them anyway, right? You're scanning the list for explicit casts.

So i wonder - what is the motivation for that?
Nothing else needs that additional knowledge right now.

It has a much more obvious specification and it's not hard to imagine clients that would want to scan that list for other things. (In fact, the comma-expressions list could pretty easily be combined into that.)

Hmm. I think the approach of flagging ICEs that are semantically part of an explicit cast is probably a better representation for tools across the board.

I could do that, but i couldn't find where it should be done.
Where are these "ICEs that are semantically part of an explicit cast" created?

The design of CastExpr is that each node does exactly one implicit conversion step, but sometimes a conversion involves performing multiple steps which each has to get its own node, so all those initial conversions have to be represented as ICEs. Also, while we do make some effort to give the explicit cast node a meaningful CastKind, in some situations it's just easier to build an ICE with the real CK and then just make the explicit cast a NoOp. All those ICEs are triggered by the checking of the explicit cast and therefore semantically part of that operation even if they're separate nodes for technical reasons.

Where would we mark them?

Well, explicit casts are handled by SemaCast.cpp, and all the checking routines are managed by a CastOperation helper class. In fact, it's set up so that you could very easily (1) stash the original expression in the CastOperation construct and then (2) in complete(), just walk from the given cast up to the original expression and mark every ICE you see as being part of the explicit cast.

Hmm, thank you for these pointers. I agree that it would be *nicer*. Let's see..

If we *are* going to do it this way, though, I think you should (1) make the collection of skipped expressions optional and (2) collect *all* the skipped expressions, not just no-op casts.

  1. I was wondering about that, will do.
  2. Well, i could do that, but i would need to filter them afterwards in my use-case.

Well, you're filtering them anyway, right? You're scanning the list for explicit casts.

I meant filter the list to only contain skipped casts, before actually scanning it for specific conditions.

So i wonder - what is the motivation for that?
Nothing else needs that additional knowledge right now.

It has a much more obvious specification and it's not hard to imagine clients that would want to scan that list for other things. (In fact, the comma-expressions list could pretty easily be combined into that.)

lebedev.ri retitled this revision from [Sema] Expr::skipRValueSubobjectAdjustments(): record skipped NoOp casts. to [Sema] Mark implicitly-inserted ICE's as being part of explicit cast (PR38166).
lebedev.ri edited the summary of this revision. (Show Details)

Moved to the flag approach, which was my initial thought.
This ended up being rather too simple, of all the alternative solutions.
Oh well, that was rather sad waste of time.

rjmccall added inline comments.Jul 20 2018, 11:09 AM
include/clang/AST/Stmt.h
205โ€“206

This needs to be unsigned to pack optimally on MSVC.

206

This needs to be serialized.

lebedev.ri added inline comments.Jul 20 2018, 11:35 AM
include/clang/AST/Stmt.h
206

Uhm, could you please explain what do you mean by 'serialized'?

rjmccall added inline comments.Jul 20 2018, 11:40 AM
include/clang/AST/Stmt.h
206

It needs to be preserved when writing an ICE into a PCH / module file. See the ASTWriter / ASTReader.

lebedev.ri marked 4 inline comments as done.
  • Use unsigned, not bool.
  • Serialization, although without tests, and likely incompatible with previous versions.
include/clang/AST/Stmt.h
206

Aha. I did add handling there but it raises questions:

  1. This will silently break with different AST serialization versions. I'm not sure how to handle it, since VERSION_MINOR isn't even read back.
  2. Does this need a test? How to write one? Like ./test/PCH/include-timestamp.cpp, using llvm-bcanalyzer?
rsmith added inline comments.Jul 20 2018, 1:55 PM
include/clang/AST/Stmt.h
206

Don't worry about breaking the serialization format. We do not maintain AST file format compatibility in general (neither between major releases nor between any two arbitrary SVN revisions). [We should probably maintain file format compatibility between patch releases, but I don't think that works right now because we check the full version number including the patch level when reading a file.]

Please do add a test: what you need to do is create a PCH containing an implicit cast expression and then import that AST file and do anything to check that the value is preserved (such as inspecting the output of -ast-dump). Maybe you could add this to the existing test/PCH/cxx_exprs.cpp test, which already does most of what you want, but doesn't have any FileCheck tests on the -ast-dump output yet.

lib/AST/ASTDumper.cpp
2122

Our predominant convention is to use lower_snake_case for such things in -ast-dump (though we're not exactly consistent on this...)

lib/Sema/SemaCast.cpp
92โ€“99

You don't need to track the OrigSrcExpr here. You can just recurse down through all the ImplicitCastExprs (they're always all notionally part of the explicit cast).

lebedev.ri added inline comments.Jul 21 2018, 12:45 AM
lib/Sema/SemaCast.cpp
92โ€“99

We do sometimes have OrigSrcExpr being ImplicitCastExpr.
https://godbolt.org/g/S5951G <- that ImplicitCastExpr would now get marked, even though it is OrigSrcExpr.
Is that expected?

lebedev.ri added inline comments.Jul 21 2018, 3:04 AM
include/clang/AST/Stmt.h
206

Uhm, so i have tried to add a test.
With git master, the -ast-dump from the first runline of test/PCH/cxx_exprs.cpp contains:

|-TypedefDecl 0x55a348020b88 </build/clang/test/PCH/cxx_exprs.h:5:1, col:44> col:44 referenced static_cast_result 'typeof (static_cast<void *>(0))':'void *'
| `-TypeOfExprType 0x55a348020b50 'typeof (static_cast<void *>(0))' sugar
|   |-ParenExpr 0x55a348020b28 <col:19, col:42> 'void *'
|   | `-CXXStaticCastExpr 0x55a348020af8 <col:20, col:41> 'void *' static_cast<void *> <NoOp>
|   |   `-ImplicitCastExpr 0x55a348020ae0 <col:40> 'void *' <NullToPointer>
|   |     `-IntegerLiteral 0x55a348020aa8 <col:40> 'int' 0
|   `-PointerType 0x55a347fe6390 'void *'
|     `-BuiltinType 0x55a347fe5b90 'void'

But the last runline (which uses PCH) does not have that.
Unfortunately i know next to nothing about PCH / AST serialization, so i'm not sure how to workaround that yet.
It does contain `-<undeserialized declarations>, so maybe i have to somehow force full deserialization?

lebedev.ri marked an inline comment as done.

Partially address some of @rsmith's review notes, see reply notes for issues.

lebedev.ri marked 3 inline comments as done.

Hurray, got the PCH test working!
I'm still unsure about marking *all* the immediate implicit casts as part of the group.

rsmith accepted this revision.Jul 23 2018, 5:47 PM
rsmith added inline comments.
lib/Sema/SemaCast.cpp
92โ€“99

Yes, that's expected, the conversion from int to char there is part of the semantics of the explicit cast from int to vector of char, and you wouldn't want your UBSan check to warn about that, right?

This revision is now accepted and ready to land.Jul 23 2018, 5:47 PM
lebedev.ri marked 3 inline comments as done.Jul 23 2018, 11:11 PM

Thank you for the review!

lib/Sema/SemaCast.cpp
92โ€“99

I was just surprized since that ImplicitCastExpr was there before this CastExpr handling.
If i make that splat implicit, it now warns about the truncation, https://godbolt.org/g/Ax7UTw
So yeah, i guess this is correct.

This revision was automatically updated to reflect the committed changes.
This revision was automatically updated to reflect the committed changes.