This is an archive of the discontinued LLVM Phabricator instance.

[clang][Interp] Track initialization state of local variables
ClosedPublic

Authored by tbaeder on Oct 11 2022, 11:42 PM.

Details

Summary

This patch is a bit all over the place, sorry for that. I tried adding comments to Pointer and Block to explain the situation.

We already have a mechanism to track initialization state, i.e. InlineDescriptor. So, this patch adds such an inline descriptor to local variables as well. However, that requires a bit of management in Descriptor.

I didn't add a test for this specifically, but while working on it, plenty of things broke left and right whenever I did something wrong.

Diff Detail

Event Timeline

tbaeder created this revision.Oct 11 2022, 11:42 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 11 2022, 11:42 PM
tbaeder requested review of this revision.Oct 11 2022, 11:42 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 11 2022, 11:42 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
tbaeder updated this revision to Diff 470724.Oct 26 2022, 4:21 AM
tbaeder updated this revision to Diff 471713.Oct 28 2022, 9:51 PM

Ping. This review has been open for a month now and a lot of following functionality depends on it, so if you need to decide which one of the patches to review, choose this one :)

aaron.ballman added inline comments.Nov 17 2022, 6:21 AM
clang/lib/AST/Interp/ByteCodeExprGen.cpp
805–810

Double checking that Src.is<const Expr *>() is correct here. That's being passed as the IsTemporary argument. I presume the logic is that if the decl type is an expression then it has to be a temporary, but that's not entirely true. Consider C where a compound literal is *not* a temporary but actually creates an lvalue: (int){12}. Will that be an issue?

840
clang/lib/AST/Interp/Interp.h
944

Do we need to initialize() here as well?

956

Same here?

clang/lib/AST/Interp/InterpBlock.h
77–86

Should we have const-correct overloads to provide read-only access?

80–88

Why two levels of reinterpret_cast to the same type?

101

Why do we want to overwrite the metadata here?

clang/lib/AST/Interp/InterpFrame.cpp
34
clang/lib/AST/Interp/Pointer.h
41
43
clang/lib/AST/Interp/Program.cpp
337–338

Why does this one get no metadata while the rest of the descriptors created do?

I don't have enough knowledge about how this works to do a better review, but I have a couple of suggestions.

clang/lib/AST/Interp/ByteCodeExprGen.cpp
805–809

See suggested comment below in InterpBlock.

clang/lib/AST/Interp/Descriptor.cpp
212

Theres enough math on stuff like AllocSize/etc that we should probably:

1- do SOMETHING to make sure we're not overflowing
2- Have assertions to make sure we got the sizes looking reasonable.

Additionally, is there any ability to refactor out the common subset of initializers into a constructor and delegate to them instead of all these really fragile looking changes?

clang/lib/AST/Interp/Descriptor.h
102

add a line:
static constexpr MetadataSize InlineDescriptor = MetadataSize{sizeof(InlineDescriptor)};
and you can use this instead of depending on 'sizeof' all over the place.

shafik added inline comments.Nov 17 2022, 9:31 AM
clang/lib/AST/Interp/Context.cpp
128
clang/lib/AST/Interp/EvalEmitter.cpp
26–27
clang/lib/AST/Interp/Program.h
131

Delete commented out code.

shafik added inline comments.Nov 17 2022, 9:31 AM
clang/lib/AST/Interp/Descriptor.h
102

It feels weird to call this NoMetadata but we will pass this as an argument to a function with a parameter of MetaDataSize. So I am expecting a size but I am getting no meta data instead and it looks like a mistake.

Maybe a better name would be ZeroMetaDataSize?

clang/lib/AST/Interp/Pointer.h
50

I feel like the comment block in InterpBlock.h is more informative.

347–348

Why the -1

clang/lib/AST/Interp/Program.h
122
130
tbaeder marked 14 inline comments as done.Nov 17 2022, 9:53 PM
tbaeder added inline comments.
clang/lib/AST/Interp/ByteCodeExprGen.cpp
805–810

That interpretation is correct and I don't know what to do about compound literals yet.

clang/lib/AST/Interp/Descriptor.cpp
212

I can try to factor it out into a common constructor. As for overflows, there is code in Program.cpp to handle that (kinda, it doesn't take into account the initmap and matadata size, as far as I can see).

clang/lib/AST/Interp/Descriptor.h
102

I kinda get what your point is, but I don't think this is confusing.

102

I'm a bit concerned about the naming here; InlineDescriptor is already a type name, but calling it MetadataSizeInlineDesc is pretty long and callers already have to prepend a Descriptor:: to that. :/

clang/lib/AST/Interp/Interp.h
944

I think so, but bitfields are currently completely untested.

clang/lib/AST/Interp/InterpBlock.h
77–86

IIRC I added that at least once in a different patch but I abandoned that and haven't had a use for a const overload since.

80–88

No reason. The outer cast isn't needed.

101

This is only called after creating an new block, so nothing is being overwritten, the metadata hasn't been filled-in yet.

clang/lib/AST/Interp/Pointer.h
50

The content might overlap, but they don't describe what Offset and Base mean.

347–348

The inline descriptor is located before the Base (which is what's passed as Offset here). See the comment block added in Pointer.h.

clang/lib/AST/Interp/Program.cpp
337–338

This is the descriptor for the array elements. The metadata passed to this function is only being used for the outermost descriptor, i.e. we use it a few lines below for the actual array but we don't use it for the array elements.

tbaeder updated this revision to Diff 476334.Nov 17 2022, 9:55 PM
tbaeder marked 3 inline comments as done.
aaron.ballman added inline comments.Nov 18 2022, 6:03 AM
clang/lib/AST/Interp/ByteCodeExprGen.cpp
805–810

My intuition is that IsTemporary should be set from Expr::isTemporaryObject() (or isa MaterializeTemporaryExpr maybe).

At the very least, you can add a .c test file with a compound literal in a _Static_assert and a FIXME comment (in the test and here in the code).

clang/lib/AST/Interp/Descriptor.h
102

Rather than using a sentinel value, should we be using Optional?

clang/lib/AST/Interp/Interp.h
944

That should be rectified. :-) It seems like that's reasonable to do as part of this patch given it's mostly just two changes here and a few test cases?

clang/lib/AST/Interp/InterpBlock.h
77–86

My worry is that retrofitting const correctness will eventually become too much of a chore and so we'll be adding yet another const-incorrect interface to the compiler. Seeing as how this is being reworked from scratch, I think we should use the opportunity to get better const-correctness, but maybe that ship has sailed?

101

Sounds like a good reason not to memset over that block then; it's useless work that will be thrown away anyway (I worry we may come to rely on this zero init accidentally)?

clang/lib/AST/Interp/Program.cpp
337–338

Don't we need the metadata to track whether the array elements have been initialized? e.g., it's not the array object that gets initialized, it's the elements that do, and we'd want to catch things like:

consteval int func() {
  int a[10];
  return a[2];
}
tbaeder marked 4 inline comments as done.Nov 18 2022, 6:24 AM
tbaeder added inline comments.
clang/lib/AST/Interp/ByteCodeExprGen.cpp
805–810

The Src.is<> is not a change introduced by this patch though.

As for a test, it'll just be rejected because we don't implement visiting CompoundLiterals at all right now.

clang/lib/AST/Interp/Interp.h
944

I can add the initialize() calls but I don't think adding bitfield test cases makes sense for this patch (pretty sure they will fail in unexpected ways anyway).

clang/lib/AST/Interp/InterpBlock.h
77–86

I don't think the ship has sailed at all. Is your argument that the function should be const because it can be? It's slightly problematic because they end up returning const char * as well. I can add the two overloads without problem, just not sure if they end up being used at all right now.

clang/lib/AST/Interp/Program.cpp
337–338

That's what the InitMap is for: it's just a bitmap storing one bit per element of a primitive array for its initialization state.

aaron.ballman added inline comments.Nov 22 2022, 5:36 AM
clang/lib/AST/Interp/ByteCodeExprGen.cpp
805–810

The Src.is<> is not a change introduced by this patch though.

Then let's add a FIXME comment to remember to come back to look into this (I'm not convinced it was correct to begin with, but it is kind of orthogonal to this patch too).

As for a test, it'll just be rejected because we don't implement visiting CompoundLiterals at all right now.

Fair!

clang/lib/AST/Interp/Descriptor.h
100

I think this should use InterpSize rather than size_t given that it's used to assign into MDSize

102

I'm still wondering if Optional is cleaner than NoMetadata...

167–172

There's some interface awkwardness here where these values are of type InterpSize but the interface returns unsigned

clang/lib/AST/Interp/InterpBlock.h
77–86

I don't think the ship has sailed at all. Is your argument that the function should be const because it can be? It's slightly problematic because they end up returning const char * as well. I can add the two overloads without problem, just not sure if they end up being used at all right now.

My argument is that pairing operations that return a pointer to mutable data with a constant operation that returns a pointer to constant data makes it easier for us to add const-correct code in the future because people are less likely to run into the const correctness "tax" where they have to add overloads/markings virally.

clang/lib/AST/Interp/Program.cpp
337–338

Ahhh okay, thank you for the explanation!

tbaeder marked 10 inline comments as done.Nov 23 2022, 3:14 AM
tbaeder added inline comments.
clang/lib/AST/Interp/Descriptor.h
102

New version uses Optional<InterpSize>.

167–172

Yes. InterpSize is also only really used in Descriptor.h. No idea why.

tbaeder updated this revision to Diff 477437.Nov 23 2022, 3:15 AM
shafik added inline comments.Dec 1 2022, 2:04 PM
clang/lib/AST/Interp/ByteCodeExprGen.cpp
834

Do we really want to the type of the expression? If we have a ValueDecl don't we want that type? I think they should be the same, do you have any examples where the expression is the type we want? I am looking at the AST from int x = 1+1L;

https://godbolt.org/z/q3Tr7Kxoq

clang/lib/AST/Interp/Descriptor.h
73

Maybe IsFieldMutable b/c we call CreateDescriptor it is a little confusing why we have IsConst and IsMutable

clang/lib/AST/Interp/Pointer.h
63–64

or -1u

clang/test/AST/Interp/cxx20.cpp
65–69

Can we also test local arrays and classes as well?

tbaeder added inline comments.Dec 2 2022, 12:00 AM
clang/lib/AST/Interp/InterpBlock.h
101

FWIW I looked into this and I think zeroing everything is what we want, so the initmap is null initially

tbaeder updated this revision to Diff 479582.Dec 2 2022, 3:25 AM
tbaeder marked an inline comment as done.
tbaeder added inline comments.Dec 2 2022, 3:38 AM
clang/lib/AST/Interp/Descriptor.h
73

Agreed, but this code was just moved up in this patch, the field was introduced earlier.

clang/lib/AST/Interp/Pointer.h
63–64

Alright, but those were not introduced in this patch.

aaron.ballman added inline comments.Dec 2 2022, 6:56 AM
clang/lib/AST/Interp/Descriptor.h
73

We can do a renaming pass once this lands, but FWIW, I'm also fine clarifying the names of things as we refactor code.

167–172

Yeah, this should be cleaned up (either here or in an NFC follow-up soon after this lands).

clang/lib/AST/Interp/InterpBlock.h
101

Hmmm, would it make more sense to have the init map setting that itself (in allocate()) rather than relying on invokeCtor() to do it?

clang/lib/AST/Interp/Pointer.h
63–64

Preference for ~0u so the types all match up nicely, fine to do in a post-commit NFC change.

tbaeder marked 9 inline comments as done.Dec 6 2022, 12:22 AM
tbaeder added inline comments.
clang/lib/AST/Interp/ByteCodeExprGen.cpp
834

I don't have a counter example but I didn't write that line either. The Expr case is for temporary values created for AST expressions, not sure what other type to use.

clang/lib/AST/Interp/InterpBlock.h
101

It's a bit blurry to me regarding the layering with Pointer/Descriptor. Descriptors don't know anything about the metadata, but pointers know that primitive arrays have an initmap. So to keep that consistent, we'd have to create a pointer to create the initmap.

But I don't think it's unreasonable to expect zero-initialization for types used in the interpreter.

clang/test/AST/Interp/cxx20.cpp
93

It might be interesting to know that the output for this test changes with https://reviews.llvm.org/D135750. After that patch, we only note that the

aaron.ballman added inline comments.Dec 6 2022, 11:17 AM
clang/lib/AST/Interp/InterpBlock.h
101

It's a bit blurry to me regarding the layering with Pointer/Descriptor. Descriptors don't know anything about the metadata, but pointers know that primitive arrays have an initmap. So to keep that consistent, we'd have to create a pointer to create the initmap.

But I don't think it's unreasonable to expect zero-initialization for types used in the interpreter.

My concern is the overhead of doing initialization twice in the interpreter. If we're going to zero init and then mostly write over the top of what we initialized, the zero init busywork we don't need to perform. Secondarily, I worry about relying on that zero init when it's not done as part of initialization due to fragility (it's a multi-step init at this point; we allocate, something else zeros, and something else fills in the data).

tbaeder added inline comments.Dec 8 2022, 11:46 PM
clang/lib/AST/Interp/InterpBlock.h
101

Right, I understand the concerns.

I have looked into using data() instead here, but that causes other problems with arrays of unknown size. (We assert in getSize()). But I know where the right place to initialize the initmap would be now.

As for performance, this shouldn't make a difference to before, since the metadata is always small. We might of course want to get rid of the memset altogether, but both that and not zero-ing the metadata needs a different patch in any case.

So I think this is okay for this patch.

tbaeder updated this revision to Diff 484212.Dec 20 2022, 4:00 AM

This looks good to me but I see at least one concern that Aaron had that he did not get back on, so I will wait for him to approve.

clang/lib/AST/Interp/ByteCodeExprGen.cpp
834

I think it should be an else if we either have a Decl or an Expr right?

clang/lib/AST/Interp/ByteCodeStmtGen.cpp
408

So by moving the check for Init forward, we will still allocate but we may not initialize.

clang/lib/AST/Interp/Descriptor.h
73

I am happy fixing this separately, please leave a TODO

clang/lib/AST/Interp/Interp.h
918

Can you explain what is going on here? Why do we need to initialize if it is not root?

tbaeder marked 4 inline comments as done.Dec 21 2022, 1:44 AM
tbaeder added inline comments.
clang/lib/AST/Interp/ByteCodeExprGen.cpp
834

Yes, that should work as well.

clang/lib/AST/Interp/ByteCodeStmtGen.cpp
408

Correct.

clang/lib/AST/Interp/Interp.h
918

Root means that Base == 0, so it cannot have a InlineDescriptor before the Base offset. That is mainly the case for global variables.

tbaeder updated this revision to Diff 484499.Dec 21 2022, 1:45 AM
tbaeder marked 2 inline comments as done.
shafik added inline comments.Dec 21 2022, 1:15 PM
clang/lib/AST/Interp/Interp.h
918

Can we add a comment that explains this a little better? I still don't get the significance, so I definitely missing something.

Also, why do we initialized before the store?

tbaeder added inline comments.Dec 22 2022, 2:26 AM
clang/lib/AST/Interp/Interp.h
918

I don't think there's a significance to the order of operations here, some do the init after the store, some the other way around.

tbaeder updated this revision to Diff 485271.Dec 26 2022, 12:33 AM

Add missing inline descriptor handling to local variables created in the EvalEmitter.

tbaeder updated this revision to Diff 488996.Jan 13 2023, 7:14 AM
shafik accepted this revision.Jan 13 2023, 9:13 AM

LGTM

@aaron.ballman are you happy with this?

This revision is now accepted and ready to land.Jan 13 2023, 9:13 AM
tbaeder updated this revision to Diff 489693.Jan 16 2023, 9:00 PM
aaron.ballman accepted this revision.Jan 18 2023, 5:59 AM

LGTM!

clang/lib/AST/Interp/Pointer.h
63–64

Don't forget to hit this one in post-commit.