This is an archive of the discontinued LLVM Phabricator instance.

[clang-repl] Introduce Value to capture expression results
ClosedPublic

Authored by junaire on Jan 7 2023, 10:58 PM.

Details

Summary

This is the second part of the below RFC:
https://discourse.llvm.org/t/rfc-handle-execution-results-in-clang-repl/68493

This patch implements a Value class that can be used to carry expression
results in clang-repl. In other words, when we see a top expression
without semi, it will be captured and stored to a Value object. You can
explicitly specify where you want to store the object, like:

Value V;
llvm::cantFail(Interp->ParseAndExecute("int x = 42;"));
llvm::cantFail(Interp->ParseAndExecute("x", &V));

V now stores some useful infomation about x, you can get its real
value (42), it's clang::QualType or anything interesting.

However, if you don't specify the optional argument, it will be captured
to a local variable, and automatically called Value::dump, which is
not implemented yet in this patch.

Signed-off-by: Jun Zhang <jun@junz.org>

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
junaire updated this revision to Diff 512837.Apr 12 2023, 8:01 AM

dont include <new> if we don't have it

I've not done a complete review yet, but I got started on it and have a handful of comments.

clang/include/clang/Basic/TokenKinds.def
945 ↗(On Diff #512801)

Should we name this repl_input_end to make it clear this is generally expected to only be used for REPL input?

clang/include/clang/Interpreter/Interpreter.h
60

I think I'm surprised to see this as a data member of Interpreter but mostly because my brain keeps thinking this interpreter is to interpret whole programs, so the idea of a "last value" is a bit odd from that context. The name is probably fine as-is, but I figured it may be worth mentioning just the same.

66

Overload set for some better const correctness.

74

Hopefully this function isn't intending to mutate the passed object?

100–102

const overload here as well.

104

Any chance we can make this const correct as well? (I'll stop asking in this review -- can you take a pass through it to add const correctness where it isn't obnoxiously viral to do so?)

clang/include/clang/Interpreter/Value.h
18

Unused include can be removed.

44

Duplicates line 27, can be removed.

45

Move this up to the rest of the forward declarations (and probably keep them in alphabetical order).

47

Is this expected to be a complete list of builtin types? e.g., should this have char8_t and void and wchar_t, etc? Should this be including clang/include/clang/AST/BuiltinTypes.def instead of manually maintaining the list?

78

Fixing indentation

84

Why do these take void * instead of the expected type?

122

Do we have to worry about member function pointers where a single pointer value may not be sufficient?

144

This doesn't match the usual pattern used in Clang where the get variant returns nullptr and the cast variant asserts if the cast would be invalid. Should we go with a similar approach?

161–163

Why don't forward declares suffice if we're storing the information by pointer?

clang/lib/CodeGen/CodeGenModule.cpp
7226 ↗(On Diff #510885)

+1 -- the codegen code owners should weigh in on whether this is reasonable as a temporary measure or not.

clang/lib/Interpreter/CMakeLists.txt
16–19

Probably should be kept alphabetical.

clang/lib/Interpreter/IncrementalParser.h
34–35

Alphabetical order

This looks a lot better already than the implementation I know from Cling. Well done!

OTOH it's a challenge to review since the patch is really big. Is there no way to break it up and test in isolation? Or strip away details that could be added in iterations? I had a short look and some comments, but I won't find the time in the coming weeks to give it the attention it deserves.

My biggest conceptual concern is that Value is part of the Interpreter library. This is going to be a showstopper for out-of-process execution. Conceptually, it should move into a runtime library (like the ORC runtime in compiler-rt) and this won't get easier the more it is entangled with the interpreter. I do see that this adds significant complexity and I guess it's ok to keep this for later, but I do wonder what would be the best way to prepare for it. Essentially, all communication with the interpreter will be asynchronous. Is this something we can do already?

clang/include/clang/Interpreter/Interpreter.h
85

Most of the Orc and JITLink moved away from JITTargetAddress already and uses ExecutorAddr nowadays. I think we should use ExecutorAddr from the beginning in this patch.

clang/include/clang/Interpreter/Value.h
99

Can we make this private and try not to introduce more dependencies on the interpreter instance?

The inherent problem is that we will have to wire these through Orc RPC in the future once we want to support out-of-process execution.

clang/lib/Interpreter/Interpreter.cpp
411

In Orc we are usually prefixing such functions with the namespace and class name from where they are used. While it can impact readability, it gives these names a structure and does prevent collisions, e.g.:
__clang_Interpreter_SetValueNoAlloc.

539

Is this a leftover from an old default label?

clang/lib/Interpreter/Value.cpp
92

Can we add a comment with an explanation of the magic numbers please?

junaire updated this revision to Diff 513918.Apr 15 2023, 8:53 AM

Address some comments

junaire marked 16 inline comments as done.Apr 15 2023, 8:57 AM
junaire added inline comments.
clang/include/clang/Interpreter/Interpreter.h
74
104

FindRuntimeInterface mutates ValuePrintingInfo vector so it can't be const afaict.

clang/include/clang/Interpreter/Value.h
18

uintptr_t comes from this header so it's not unused.

78

It's already formatted by clang-format and if I change this arc doesn't even allow me to upload the diff. I believe it's a bug in clang-format and I have filed https://github.com/llvm/llvm-project/issues/60535 So I'd like to fix this when I commit this :)

clang/lib/Interpreter/Interpreter.cpp
539

there's no default label since we handle all the cases here. But indeed it looks a bit ugly and I removed it.

clang/lib/Interpreter/Value.cpp
92

I'm uncertain if these numbers actually have specific meanings, I just copied them from Cling directly.

junaire marked 4 inline comments as done.Apr 15 2023, 9:42 AM
junaire added inline comments.
clang/include/clang/Interpreter/Interpreter.h
85
v.g.vassilev added inline comments.Apr 15 2023, 9:49 AM
clang/lib/CodeGen/CodeGenModule.cpp
7226 ↗(On Diff #510885)
junaire updated this revision to Diff 513926.Apr 15 2023, 9:56 AM
junaire marked an inline comment as done.

Address more comments

junaire marked an inline comment as done.Apr 15 2023, 10:01 AM
junaire added inline comments.
clang/include/clang/Interpreter/Value.h
84

Yeah for the first parameter, we could just use Interpreter* but the second one is an opaque type so I think we should keep it?

junaire updated this revision to Diff 513954.Apr 15 2023, 6:56 PM
junaire marked an inline comment as done.

Rebase

junaire marked an inline comment as done.Apr 15 2023, 7:03 PM
junaire added inline comments.
clang/include/clang/Interpreter/Value.h
99

Ughh, I don't understand can you elaborate a bit? We can't make these accessors private since they are meaningful API? it will be used intensively in ValuePrinter.cpp (the second patch), and the end users may want to use them as well.

144

These APIs are adopted in Cling and Cling will remove the old implementation and use the upstream version of the value printing feature at some point in the future. So @v.g.vassilev hope we can keep these APIs close to Cling's variants as much as we can. I don't have a strong opinion here, what's your take here? @v.g.vassilev

I added some of background and historical notes. I hope that helps.

clang/include/clang/Interpreter/Interpreter.h
60

One way to think of this is every repl has some way to access the last result it created when executed its last chunk of input. In python that's the _ value. In the debugger it is the %N where N is some number. Here this variable would enable implementing a similar to these in future.

clang/include/clang/Interpreter/Value.h
47

This is used for various things including storing the bits into a big-endian agnostic way. For void we have a special case in the Value and we cannot define a union of a void type. We can include the others that you suggest. All the relevant ones are described in clang::BuiltinType::getName.

We cannot use BuiltinTypes.def because we want to have a mapping between the type as written in the language (eg, bool, unsigned, etc) and its underlying type name. That mapping is not available in BuiltinTypes.def. Ideally we should extend BuiltinTypes.def somehow but I'd prefer outside of this patch. One of the challenges is that some of the types depend on the language options (eg. _Bool vs bool) and I am not sure this can be resolved by tablegen.

On a broader perspective, the Value class is responsible for two things: (a) get a value from the interpreter to compiled code (see test case); (b) set a value from compiled code to the interpreter. The second case is not yet covered (I can open soon another patch). The major use-case is calling at runtime functions and taking input parameters from compiled code.

FWIW, we should probably move all of these entities in a separate namespace. I'd suggest caas (compiler-as-a-service) and possibly rename the Value to InterpreterValue since Value is very generic and there are already a couple of classes with that name in llvm and clang.

84

See my previous comments on performance. We cannot include anything bulky in the header file.

99

I think what @sgraenitz means here is that the Value class combines two things. Essential information about what happened in the JIT and its higher-level type representation by the language it came from.

In theory, and I don't know how, we could split this somehow into two. The first part would become a pure JIT thing which only cares about putting bits in and out of the JIT and the second part would be the one that attach the exact meaning of what these bits are in the high-level language.

This would allow us to run C++ on small devices which cannot fit the entire clang-repl/jit/llvm infrastructure. That would happen by creating an out-of-process execution which then communicates with the host with the Value class. That would be a critical component to have, however, I'd prefer to land this and then do some research how to make this happen. This patch is critical for several projects already and I don't think we can wait too much. The lack of this patch makes clang-repl non-functional.

I believe @sgraenitz described the out-of-process execution in his talk here: https://compiler-research.org/meetings/#caas_10Mar2022 @sgraenitz please feel free to correct me if my interpretation of your words incorrect.

122

I am not sure if I understand your comment correctly. We only store objects and not function/member pointers (perhaps in some pathological way one could do that but I'd say it is not supported). Here we want to know if we stored it as a non-builtin type we need to make the pointer-like casts. My take is that for now we shouldn't worry about member function pointers.

144

I probably see the confusion. The getAs<T> operation is only relevant for pointer types. For example we stored object of type UserClass as a pointer. For instance, I don't see how getAs<float> can return a nullptr. We can probably inline the implementation of getAs for non-pointer types into castAs and leave getAs unimplemented relying only on the explicit specialization bellow for pointers.

The castAs operation means take a valid bit representation and transforms it into the requested one. I don't think it can or should be ever invalid. Here we are reshuffling bits.

These changes are very subtle and if we decide to make changes we should open a pull request against the ROOT project to make sure we are not breaking some important case. The reference implementation is here: https://github.com/root-project/root/blob/master/interpreter/cling/include/cling/Interpreter/Value.h

161–163

This is a performance-critical class. We literally measure the instruction count for it. We practically cannot include anything in this header file because the class needs to included as part of the interpreter runtime. For example:

#include <clang/Interpreter/Value.h>
Value ResultV;
gInterpreter->evaluate("float i = 12.3; i++", &V);
printf("Value is %d\n", ResultV.castAs<int>());

This is how you can do things in Cling. This not yet there but that's our next step.

For performance reasons we have the ValueKind optimization which allows us to perform most of the operations we need very fast. There are some operations such as printing the concrete type which need the actual QualType and so on but they are outside of the performance critical paths and it is okay to resort back to the real types providing the level of accuracy we need.

OTOH it's a challenge to review since the patch is really big. Is there no way to break it up and test in isolation? Or strip away details that could be added in iterations? I had a short look and some comments, but I won't find the time in the coming weeks to give it the attention it deserves.

My biggest conceptual concern is that Value is part of the Interpreter library. This is going to be a showstopper for out-of-process execution. Conceptually, it should move into a runtime library (like the ORC runtime in compiler-rt) and this won't get easier the more it is entangled with the interpreter. I do see that this adds significant complexity and I guess it's ok to keep this for later, but I do wonder what would be the best way to prepare for it. Essentially, all communication with the interpreter will be asynchronous. Is this something we can do already?

Similar thoughts here. In particular, is there a way to break the IsSemiMissing/ repl_input_end parts out from the Value part?

It sounds like it would be helpful for clang-repl to land this as-is, but we should have a plan to replace this with something compatible with out-of-process execution in the longer term. At a guess I think that would mean using regular global variable instances to manage the storage on the executor side, an extended MemoryAccess interface to read/write the value from the REPL side when needed (e.g. for printing), and emitting glue functions to pass the variable's value in to callers.

junaire updated this revision to Diff 516080.Apr 22 2023, 9:19 AM

Break the IsSemiMissing/ repl_input_end parts out from the Value part

junaire updated this revision to Diff 516081.Apr 22 2023, 9:23 AM

Rebuild, please.

junaire updated this revision to Diff 517829.Apr 28 2023, 12:55 AM

fix windows buildbots

aaron.ballman added inline comments.Apr 28 2023, 8:25 AM
clang/include/clang/Interpreter/Interpreter.h
96

It looks like this can be private?

Also, just a note (not something you have to deal with in this review), the local naming convention seems to have decided that if the function starts with get then it's camel case and otherwise the name is pascal case.

clang/include/clang/Interpreter/Value.h
47

We can include the others that you suggest. All the relevant ones are described in clang::BuiltinType::getName.

Okay, so the plan is to handle all the builtin types (_BitInt, _Complex, various floating point formats, etc)? Will that be part of this patch or in follow-up work? (My intuition is that we should consider it up front because some of the builtin types are interesting -- like _BitInt because it's parameterized, which makes it novel compared to the other types.)

We cannot use BuiltinTypes.def because we want to have a mapping between the type as written in the language (eg, bool, unsigned, etc) and its underlying type name. That mapping is not available in BuiltinTypes.def. Ideally we should extend BuiltinTypes.def somehow but I'd prefer outside of this patch. One of the challenges is that some of the types depend on the language options (eg. _Bool vs bool) and I am not sure this can be resolved by tablegen.

Thanks for the explanation! BuiltinTypes.def works well enough for times when we want to use macros and include the file to generate switch cases and the likes, but you're right that it's not well-suited for this. One thing to consider is whether we should change BuiltinTypes.def to be BuiltinTypes.td instead and use tablegen to generate the macro/include dance form as well as other output (such as for your needs, that can then consider language options or more complex predicates).

FWIW, we should probably move all of these entities in a separate namespace. I'd suggest caas (compiler-as-a-service) and possibly rename the Value to InterpreterValue since Value is very generic and there are already a couple of classes with that name in llvm and clang.

I'm not in love with the name caas because that's not really a common acronym or abbreviation (and it looks like a typo due to aa). However, we already have an interp namespace in Clang for one of the other interpreters (constant expression evaluation), so that's not available for use. How about repl though?

As for considering changing the name from Value because of how many other Value types we have already... that's both a reason to rename and reason not to rename. I think I'm fine leaving it as Value so long as it's in a novel namespace.

78

Totally fine to fix this on commit, thank you for filing the issue!

84

I think I understand why the design is the way it is, but it still makes me uneasy. The constructor takes a pointer to some bucket of bytes... no size information, no type information, etc. Just "here's a random pointer". And then later, we hope the user calls setKind() in a way that makes sense.

We need it to be fast, but we also need it to be correct -- the type system is the best tool for helping with that.

105–109

If we can't get type safety through construction, should we consider adding assertions here that a value's kind and opaque type agree, or that we're not changing the type or kind without also changing the pointer?

122

Given:

struct Foo {
struct Base {
  virtual int bar();
};

struct Foo : Base {
  int bar() { return 12; }
};

int (Foo::*bar_ptr)() = &Foo::bar;

The object bar_ptr requires two pointers of space to represent: https://godbolt.org/z/o81sbjKM6, so I'm wondering whether Value can represent bar_ptr

144

I think castAs can be invalid -- if you have the bits for a float and you try to cast it as a pointer, that's not going to lead to good things, right?

My point is largely that the names getAs and castAs have meaning within Clang already and this seems to assign slightly different meaning to them, which might trip folks up. It might be worth considering renaming them to something like bitCast and pointerCast (or something else, I'm not tied to these names)?

161–163

That sounds like it's going to lead to maintenance problems in the long term, right? I can't think of another header file we say "don't touch this because it may impact runtime performance", and so I can easily imagine someone breaking your expectation that this file can't include anything else.

Is there a long-term plan for addressing this?

clang/lib/Interpreter/IncrementalParser.cpp
21

Does it? This file is in clang/lib/Interpreter so including other things from clang/Interpreter would not be a layering violation normally.

junaire updated this revision to Diff 518166.Apr 29 2023, 8:03 AM
junaire marked 4 inline comments as done.

Partially address some comments

clang/include/clang/Interpreter/Interpreter.h
96

Fixed.

clang/include/clang/Interpreter/Value.h
84

Not really... The user doesn't need to call setKind() explicitly to construct a Value, the constructor will handle it automatically. See ConvertQualTypeToKind in Value.cpp. So if the pointer is just some garbage data, the constructor should fail before yielding out a valid instance.

clang/lib/Interpreter/IncrementalParser.cpp
21

That's an old comment, already fixed and marked it as done :)

junaire updated this revision to Diff 518265.Apr 29 2023, 8:31 PM

Add unittests for void & member pointers types

Thanks again for your efforts pushing this patch. It has gone a long way and I believe we are getting there.

clang/include/clang/Interpreter/Value.h
47

We can include the others that you suggest. All the relevant ones are described in clang::BuiltinType::getName.

Okay, so the plan is to handle all the builtin types (_BitInt, _Complex, various floating point formats, etc)? Will that be part of this patch or in follow-up work? (My intuition is that we should consider it up front because some of the builtin types are interesting -- like _BitInt because it's parameterized, which makes it novel compared to the other types.)

That's a good point. I think the current patch offers a new capability and it is probably fine to not address all at once. My concern is that @junaire has a month to work on these things and this patch is the first out of two patches. The risk here is to drop the ball on that altogether if we add more work as a requirement for that patch to go in.

We cannot use BuiltinTypes.def because we want to have a mapping between the type as written in the language (eg, bool, unsigned, etc) and its underlying type name. That mapping is not available in BuiltinTypes.def. Ideally we should extend BuiltinTypes.def somehow but I'd prefer outside of this patch. One of the challenges is that some of the types depend on the language options (eg. _Bool vs bool) and I am not sure this can be resolved by tablegen.

Thanks for the explanation! BuiltinTypes.def works well enough for times when we want to use macros and include the file to generate switch cases and the likes, but you're right that it's not well-suited for this. One thing to consider is whether we should change BuiltinTypes.def to be BuiltinTypes.td instead and use tablegen to generate the macro/include dance form as well as other output (such as for your needs, that can then consider language options or more complex predicates).

I am totally with you here. I am not sure how to do this since as of now some of the types and their form as written are decided with a flag (IIRC the various char flavors).

FWIW, we should probably move all of these entities in a separate namespace. I'd suggest caas (compiler-as-a-service) and possibly rename the Value to InterpreterValue since Value is very generic and there are already a couple of classes with that name in llvm and clang.

I'm not in love with the name caas because that's not really a common acronym or abbreviation (and it looks like a typo due to aa). However, we already have an interp namespace in Clang for one of the other interpreters (constant expression evaluation), so that's not available for use. How about repl though?

The only problem I have with repl is misleading. repl usually means to people something that you run at your prompt and can type some expressions in and see some results. However, here we are building a foundational primitive to allow embedding an interpreter as part of your static program and be able to cross the compiler/interpreter boundary. The compiler-as-a-service (caas) is probably the closest I could find in CS research terminology to describe that. Would adding some solid bits of documentation to the entire approach make you more comfortable with that naming?

FWIW, the namespace comment should not be a requirement for this particular patch. I think it is totally fine to keep it as it is and do the change outside of this patch to help the review. Moreover, we will probably need to move some of the existing components already.

As for considering changing the name from Value because of how many other Value types we have already... that's both a reason to rename and reason not to rename. I think I'm fine leaving it as Value so long as it's in a novel namespace.

Ok, if that's not confusing, then let's keep it the way it was. That way it should be easier to adopt that downstream for sure!

161–163

We have a few components like the Lexer that are extremely prone to performance regressions.

In terms for a longer-term plan in addressing this there are some steps could help IMO. First, this component is relatively standalone and very few changes will be required over time, for these I am hoping to be listed as a reviewer. Second, we can add a comment in the include area, making a note that including anything here will degrade the performance of almost all interpreted code. Third, we will find out about this in our downstream use-cases as the things get significantly slower.

Neither is silver bullet but that's probably the best we could do at that time. Btw, we might be able to add a test that's part of LLVM's performance analysis infrastructure.

clang/lib/Interpreter/Interpreter.cpp
211

Can you add an #ifdef __cplusplus and add a value printing tests that run in clang-repl in C mode?

junaire updated this revision to Diff 518285.Apr 30 2023, 2:45 AM
junaire marked 3 inline comments as done.

Only enable __clang_Interpreter_SetValueCopyArr support in C++ mode.

junaire added inline comments.Apr 30 2023, 3:35 AM
clang/include/clang/Interpreter/Value.h
122

I added a test case for this.

144

We want to rename:
getAs --> as
castAs --> convertTo

Does these look good to you?

junaire marked an inline comment as done.May 1 2023, 8:18 AM
junaire added inline comments.
clang/lib/Interpreter/Interpreter.cpp
211

After a private chat, we decide to disable the feature in C mode because it sounds like a bad idea to give C incomplete value printing support.

aaron.ballman added inline comments.May 2 2023, 8:48 AM
clang/include/clang/Interpreter/Value.h
47

We can include the others that you suggest. All the relevant ones are described in clang::BuiltinType::getName.

Okay, so the plan is to handle all the builtin types (_BitInt, _Complex, various floating point formats, etc)? Will that be part of this patch or in follow-up work? (My intuition is that we should consider it up front because some of the builtin types are interesting -- like _BitInt because it's parameterized, which makes it novel compared to the other types.)

That's a good point. I think the current patch offers a new capability and it is probably fine to not address all at once. My concern is that @junaire has a month to work on these things and this patch is the first out of two patches. The risk here is to drop the ball on that altogether if we add more work as a requirement for that patch to go in.

Yeah, the current patch is incremental progress, so I think it's defensible to leave this to follow-up work. But I worry that follow-up work is going to be hampered by the design choices made here, and I mostly want to avoid a situation where this works inconsistently and that's "good enough" for an extended period of time. It feels a bit like working with any builtin type is part of the MVP. However, I don't insist because this is still useful forward progress for a lot of use cases.

The only problem I have with repl is misleading. repl usually means to people something that you run at your prompt and can type some expressions in and see some results. However, here we are building a foundational primitive to allow embedding an interpreter as part of your static program and be able to cross the compiler/interpreter boundary. The compiler-as-a-service (caas) is probably the closest I could find in CS research terminology to describe that. Would adding some solid bits of documentation to the entire approach make you more comfortable with that naming?

On the one hand, if we have to document what the name means, the name isn't very meaningful. On the other hand, we use ento for the static analyzer (because of "entomologist", one who studies bugs...), so I don't think users routinely get hung up by not understanding the namespace identifier. If you think caas is the best name for this, then let's go with that unless someone has a better suggestion.

FWIW, the namespace comment should not be a requirement for this particular patch. I think it is totally fine to keep it as it is and do the change outside of this patch to help the review. Moreover, we will probably need to move some of the existing components already.

Agreed.

84

Yeah, that's a fair point, except nothing actually validates that the opaque pointer you are handed is actually valid for anything because it eventually just does a reinterpret_cast, so I don't think the constructor will fail.

122

Thank you for the test case!

144

Yeah, I think those are good replacements, thank you!

161–163

Neither is silver bullet but that's probably the best we could do at that time. Btw, we might be able to add a test that's part of LLVM's performance analysis infrastructure.

Yeah, we should probably consider doing that. But to make sure I understand the performance concerns... when we change functionality in the lexer, we (potentially) slow down the lexing phase of compilation. That's straightforward and unsurprising. But in this case, it sounds like the act of including another header file in this header file will cause a runtime performance concern, even if no other changes are made. If I'm correct, I can't think of anything else in the compiler that works like that.

clang/lib/Interpreter/Interpreter.cpp
211

Is that disable happening in this patch, or is it in a different patch?

511

The ArrSize argument is being passed for Ptr in the call... is that correct? (It makes sense to me, but I don't get why the parameter is named Ptr to begin with.)

clang/lib/Interpreter/InterpreterUtils.cpp
20

This question applies more generally than just this function, but should we be requiring these interfaces to supply a SourceLocation rather than hard-coding no location information? That can make it easier to see what's going on when dumping the AST for debugging purposes, etc even if it's not necessary for diagnostics or other reasons. (I don't insist, just a speculative question about the interface.)

56–59
clang/lib/Interpreter/InterpreterUtils.h
39

What is Ptr pointing to?

Should this be renamed to UIntPtrTLiteralExpr with a comment that says it creates an integer literal whose type is uinptr_t?

clang/lib/Interpreter/Value.cpp
92

That's all the more reason to document the numbers -- should they stay in sync with Cling?

100
103

Then reuse BT below instead of converting it again. (It's usually a code smell to have isa<> followed by cast<> operations.)

junaire updated this revision to Diff 518952.May 2 2023, 8:24 PM
junaire marked 16 inline comments as done.

Address comments from @aaron.ballman, thanks!

clang/include/clang/Interpreter/Value.h
47

Thanks for the comments from @aaron.ballman & @v.g.vassilev. So IIUC we'll handle the builtin types in later patches and add a namespace (caas) for the entire libclangInterpreter infrastructure.

161–163

I believe what @v.g.vassilev means is that the repl itself might include Value.h as a part of *runtime*, so if the header is heavy, you can notice the repl is slowed down. (That's obvious) So keep in mind we're breaking the boundary between the compiled code and interpreted code (It's kinda confusing) here it actually impacts interpreted code.

clang/lib/Interpreter/Interpreter.cpp
211

In https://reviews.llvm.org/D148997, See ParseStmt.cpp:551

511

Sorry, my bad. I renamed the argument name, it should make sense now. (Pass ArrSize as an interger literal`)

clang/lib/Interpreter/InterpreterUtils.cpp
20

Keep in mind these helpers are actually used for *synthesizing* AST nodes, so there are no SourceLocations available here. If you dumped the AST, you probably will see something like:

CallExpr 0x621000093270 'void'
|-ImplicitCastExpr 0x621000093250 'void (*)(void *, void *, void *, unsigned long long)' <FunctionToPointerDecay>
| `-DeclRefExpr 0x6210000931f8 'void (void *, void *, void *, unsigned long long)' lvalue Function 0x62100008fd98 '__clang_Interpreter_SetValueNoAlloc' 'void (void *, void *, void *, unsigned long long)'
|-CStyleCastExpr 0x621000093060 'void *' <IntegralToPointer>
| `-IntegerLiteral 0x621000093020 'unsigned long' 106515188942880
|-CStyleCastExpr 0x6210000930d0 'void *' <IntegralToPointer>
| `-IntegerLiteral 0x621000093090 'unsigned long' 106515188942912
|-CStyleCastExpr 0x621000093140 'void *' <IntegralToPointer>
| `-IntegerLiteral 0x621000093100 'unsigned long' 107820859101728
`-CStyleCastExpr 0x6210000931c8 'unsigned long long' <NoOp>
  `-ImplicitCastExpr 0x6210000931a8 'unsigned long long' <IntegralCast> part_of_explicit_cast
    `-ImplicitCastExpr 0x621000093188 'int' <LValueToRValue> part_of_explicit_cast
      `-DeclRefExpr 0x621000092ec0 'int' lvalue Var 0x621000092d50 'x' 'int'

Horrible, but that's all we can do :(

clang/lib/Interpreter/InterpreterUtils.h
39

I think it should be uint64_t. Fixed.

junaire updated this revision to Diff 518960.May 2 2023, 9:10 PM
junaire marked 2 inline comments as done.

Add more comments.

clang/include/clang/Interpreter/Interpreter.h
60

I added some comments.

clang/include/clang/Interpreter/Value.h
84

OK, let me try to answer why we're passing QualType in an opaque pointer way. So the Value is *constructed* in __clang_Interpreter_SetValue* variants, and these functions (let's call them runtime interfaces) are synthesized when we want to do value printing (like we have a x without semi). So it makes things pretty hard if we use a complete type (How can we synthesize that type? That dramatically complicated the code) In addition, we want Value.h, which is part of the runtime, as lightweight as possible, so we can't use the complete type in its constructor, or we have to include the corresponding header file.

junaire updated this revision to Diff 518961.May 2 2023, 9:13 PM

Fix typos

aaron.ballman added inline comments.May 3 2023, 12:32 PM
clang/include/clang/Interpreter/Value.h
47

That matches my understanding.

84

Okay, I think I was getting hung up on your second point:

In addition, we want Value.h, which is part of the runtime, as lightweight as possible, so we can't use the complete type in its constructor, or we have to include the corresponding header file.")

while forgetting your first point:

So the Value is *constructed* in __clang_Interpreter_SetValue* variants, and these functions (let's call them runtime interfaces) are synthesized when we want to do value printing (like we have a x without semi). So it makes things pretty hard if we use a complete type (How can we synthesize that type? That dramatically complicated the code)

The first point makes sense to me and is pretty reasonable rationale for why we pass void *, thanks! The second point still isn't particularly compelling to me (more on that below).

161–163

I believe what @v.g.vassilev means is that the repl itself might include Value.h as a part of *runtime*, so if the header is heavy, you can notice the repl is slowed down. (That's obvious) So keep in mind we're breaking the boundary between the compiled code and interpreted code (It's kinda confusing) here it actually impacts interpreted code.

I'm not certain that's a reasonable design choice to make. Or, stated somewhat differently, I'm really uncomfortable with having header files we can't maintain because changing them impacts runtime performance in surprising ways. That's not a sustainable design even if we think this header will be reasonably stable. We need *some* amount of abstraction here so that we can have a clean design for the REPL interpreter without NFC code changes impacting performance. Otherwise, people will be discouraged from adding comments to this file (those take time to lex, after all), or using long identifiers (longer identifiers take longer to lex than shorter ones), or including what is used instead of using void * (as being discussed here), and so on.

This is quite probably something you've already thought about plenty, but... could we add an abstraction layer so that the interpreter side of things has a "low-token-count" interface that dispatches through to the actual implementation?

clang/lib/Interpreter/InterpreterUtils.cpp
20

So now this interface takes a uint64_t but it creates a uintptr_t literal; should this be making a uint64_t literal instead?

20

Ah, that's true, this is all synthesized.

clang/lib/Interpreter/InterpreterUtils.h
39

I'm still seeing IntegerLiteral *IntegerLiteralExpr(ASTContext &C, uintptr_t Ptr); for the declaration (the definition is using uint64_t though).

v.g.vassilev added inline comments.May 7 2023, 1:50 PM
clang/include/clang/Interpreter/Value.h
161–163

I believe what @v.g.vassilev means is that the repl itself might include Value.h as a part of *runtime*, so if the header is heavy, you can notice the repl is slowed down. (That's obvious) So keep in mind we're breaking the boundary between the compiled code and interpreted code (It's kinda confusing) here it actually impacts interpreted code.

I'm not certain that's a reasonable design choice to make. Or, stated somewhat differently, I'm really uncomfortable with having header files we can't maintain because changing them impacts runtime performance in surprising ways. That's not a sustainable design even if we think this header will be reasonably stable. We need *some* amount of abstraction here so that we can have a clean design for the REPL interpreter without NFC code changes impacting performance. Otherwise, people will be discouraged from adding comments to this file (those take time to lex, after all), or using long identifiers (longer identifiers take longer to lex than shorter ones), or including what is used instead of using void * (as being discussed here), and so on.

All valid points. I guess we have seen some changes related to compilation speed in the past in the STLExtras.h (iirc, although I cannot find the right commit). We did particular changes to the header file to reduce the compilation time of some large TU builds. I'd think that's more like the case of stddef.h and similar headers in the resource directory. The more we add the worse becomes the compiler startup time.

This is quite probably something you've already thought about plenty, but... could we add an abstraction layer so that the interpreter side of things has a "low-token-count" interface that dispatches through to the actual implementation?

Yes, I have a plan that's quite ambitious (and a draft RFC): basically the idea is any #include to become a no-op for the compiler unless something is actually needed.

I understand your concern here but I don't really know how to address it in this particular patch.

clang/lib/Interpreter/InterpreterUtils.cpp
20

Outside of this review, we've been entertaining the idea of providing a source location scratch area which represents some memory region of the decompiled to text synthesized AST. That can solve a number of general issues with synthesized code but we have not got around to experiment with it.

clang/lib/Interpreter/Value.cpp
92

I believe the explanation is at the use in isAlive:

// Check whether the storage is valid by validating the canary bits.
// If someone accidentally write some invalid bits in the storage, the canary
// will be changed first, and `IsAlive` will return false then.

I'd suggest here we can say that these are random numbers that allow us to detect if someone wrote bits in our storage accidentally. Or actually refer the the documentation in isAlive.

using regular global variable instances to manage the storage on the executor side, an extended MemoryAccess interface to read/write the value from the REPL side when needed (e.g. for printing), and emitting glue functions to pass the variable's value in to callers.

Agree, that's probably the better solution. Just for the record: Values couldn't be temporaries and access must be synchronized with execution to avoid races. I guess both is easily acceptable.

clang/include/clang/Interpreter/Value.h
161–163

[...] include Value.h as a part of *runtime*, so if the header is heavy, you can notice the repl is slowed down. (That's obvious) So keep in mind we're breaking the boundary between the compiled code and interpreted code (It's kinda confusing) here it actually impacts interpreted code.

I'm really uncomfortable with having header files we can't maintain because changing them impacts runtime performance in surprising ways. [...] could we add an abstraction layer so that the interpreter side of things has a "low-token-count" interface that dispatches through to the actual implementation?

Agree. What about splitting this up in 3 parts?
(1) Private API: interface as consumed by the libclangInterpreter
(2) Public API: interface as consumed by tools, other LLVM projects and out-of-tree
(3) Client API: the actual runtime-only parts with low-token-count etc.

(1) and (2) would be here in the source-tree. (3) is a resource file similar to intrinsics headers, I believe they have very similar requirements on performance, maintenance and versioning.

clang/lib/Interpreter/Interpreter.cpp
539

Perfect

v.g.vassilev added inline comments.May 8 2023, 2:07 AM
clang/include/clang/Interpreter/Value.h
161–163

I agree with this suggestion. We have actually tried (3) and discovered that it is challenging to write unit tests for it. The main issue is that the clang tests have been designed to avoid relying on the resource directory, which is relative to the clang binary. Moreover, some builds redefine the location of the resource directory during compilation, making it difficult to locate and pass to a unit test whose location is independent of the clang binary's location...

junaire updated this revision to Diff 520324.May 8 2023, 3:53 AM

Rebase + Update

junaire updated this revision to Diff 520326.May 8 2023, 4:02 AM
junaire marked an inline comment as done.

.

junaire marked 2 inline comments as done.May 8 2023, 4:04 AM
junaire marked 10 inline comments as done.May 12 2023, 12:30 AM
junaire added inline comments.
clang/include/clang/Interpreter/Value.h
161–163

Hi @aaron.ballman, I would like to try to explain why we don't have a better approach again.

Value is very special, it's not a regular runtime that would only be used by the REPL, instead, it acts like a messager, and connects the compiled code and the interpreted code. That said both sides need to reference it. so it can't be put into the resource directory, or the host compiler that is used for compiling clang-repl can't find it, then we fail to compile. What's more, it can't be put into compiler-rt too, same reason, Interpreter.h needs to include the header. The only possible location I can think of is the regular include/clang/Interpter.

When it comes to maintenance issues, I believe that's fine. IIUC your main concern is that people will be afraid to touch this file since it has a surprising performance impact. However, the whole header is only about the declaration of Value class and it should only be used for this purpose. So unless someone is working on clang-repl, it's unlikely for them to touch it. Another thing I would like to mention is that the performance is only noticeably dropped if we pulled some large headers, like <string>, <memory>, or something like that. NFC changes like fixing typos, adding comments, and adjusting function names are totally fine. (They can't simply refactor the void* thing, or the whole program simplify fail to compile)

We could also add some comments to indicate the specificity of this header, and why people should be cautious when touching it.

aaron.ballman added inline comments.May 15 2023, 9:01 AM
clang/include/clang/Interpreter/Value.h
161–163

I had a nice off-list conversation with @v.g.vassilev about my concerns and we think we've got a path forward to get you unblocked.

Put a gigantic, obvious note at the top of Value.h saying that 1) changes to this file impact *runtime* performance of the interpreter and so extreme caution should be used when making changes to the file, especially when including new headers., 2) A FIXME comment that explains why this design is a problem and what the very vague idea is for addressing the issue, and that the contents of this file are experimental and subject to change.

#1 helps code reviewers remember that this file is special right now, and #2 makes it clear that nobody should rely on the API being stable because we're hopefully going to make significant changes to it in the relatively near future.

(Note: one of the conversation topics was that there is interest in running the interpreter on embedded systems, which is another compelling reason that we'll need a less token-heavy interface for the performance sensitive parts.)

junaire updated this revision to Diff 522527.May 16 2023, 3:52 AM

Add comments

v.g.vassilev added inline comments.May 16 2023, 3:59 AM
clang/include/clang/Interpreter/Value.h
24

I would move the note right after the only #include we got.

junaire updated this revision to Diff 522530.May 16 2023, 4:01 AM

Address comment from Vassil, thx

v.g.vassilev accepted this revision.May 16 2023, 4:03 AM

I believe we are at the stage of better is enemy of good now. This looks good to me. I'd recommend to wait for @aaron.ballman's final review.

This revision is now accepted and ready to land.May 16 2023, 4:03 AM
junaire marked 11 inline comments as done.May 16 2023, 4:03 AM
aaron.ballman accepted this revision.May 16 2023, 4:33 AM

LGTM aside from a teeny tiny whitespace nit.

clang/include/clang/Interpreter/Value.h
28
junaire updated this revision to Diff 522545.May 16 2023, 4:50 AM

remove whitespace

Because I have edited the commit messages several times so I forgot to include the revision link. Sorry. I have landed it as https://reviews.llvm.org/rGa423b7f1d7ca8b263af85944f57a69aa08fc942c

junaire closed this revision.May 16 2023, 5:25 AM