This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Introduced a Context that stores implicit data
ClosedPublic

Authored by ilya-biryukov on Nov 27 2017, 4:50 AM.

Diff Detail

Event Timeline

ilya-biryukov created this revision.Nov 27 2017, 4:50 AM
bkramer requested changes to this revision.Nov 27 2017, 4:59 AM
bkramer added a subscriber: bkramer.
bkramer added inline comments.
clangd/Context.h
80

This is a giant code smell. If we want the context route, please pass contexts everywhere. I really don't want this kind of technical debt in clangd now.

This revision now requires changes to proceed.Nov 27 2017, 4:59 AM
ilya-biryukov added inline comments.Nov 27 2017, 5:15 AM
clangd/Context.h
80

I'm with you on this one, but I think @sammccall was keen on having the global context. The main reason was to always have access to some loggers and tracers, even when it's hard to pass the Context into the function.
It's perfectly easy to remove all usages of globalCtx(), currently only 8 usages to get rid of. However, I'd wait for Sam's comment before doing that.

sammccall edited edge metadata.Nov 28 2017, 1:20 AM

Thanks for doing this!
Most of my comments are of the form "can we make this conceptually simpler, and somewhat less awesome".
This is because it's a somewhat unusual/scary pattern, so I'd like the implementation to be as small and transparent as possible.

clangd/Context.cpp
17

Seems like it would simplify things if:

  • GlobalCtx was always set (static local trick)
  • GlobalSession swapped its context with Global, and then swapped back in its destructor
clangd/Context.h
65

nit: const

66

We add complexity here (implementation and conceptual) to allow multiple properties to be set at the same level (vs having a key and an AnyStorage and making Context a linked list).
Is this for performance? I'm not convinced it'll actually be faster for our workloads, or that it matters.

72

Maybe WithGlobalContext is a good name for this scoped object?

This comment doesn't give a clear idea why someone would want to call this.
Maybe All contexts used by clangd inherit from this global context (including contexts created internally)

80

It's important to be able to call functions that require a context if you don't have one - adding a log statement/trace for debugging shouldn't require changing plumbing/interfaces. (It's fine if we want to avoid checking in such code...)
Having an "empty" global context allows this.

At the same time, we want the ability to set the sink for logs/traces etc globally.

A couple of options:

  • the sink (e.g. Logger) is part of the context, we need to allow embedders to set/augment the global context
  • the sink is not stored in the context, instead it is some other singleton the embedder can set up

I don't have a strong opinion which is better. It's nice to reuse mechanisms, on the other hand loggers vs request IDs are pretty different types of data.

117

This seems more naturally a method on Context, e.g.

Context C = globalCtx().derive(key, value);

The relationship between global and C is clear.

(You can allow chaining+mapping by having Context::derive and ContextBuilder::derive both return ContextBuilder&&, but as noted below I'm not sure it's worth the complexity over Context::derive ->Context)

clangd/TypedValueMap.h
1 ↗(On Diff #124354)

This might be doing a little more than it needs to.

Do we need the ability to have multiple values of the same type?
If request ID is an int, needing to do struct RequestID { ID int }; doesn't seem like much of a burden, and simplifies the semantics here.

And for cases like Logger where the type carries the semantics, keying by type is clearer.

76 ↗(On Diff #124354)

Shouldn't this just be Type *A?
I think the extra convenience of PtrKey probably isn't worth the complexity, though.

ilya-biryukov added inline comments.Nov 28 2017, 2:16 AM
clangd/Context.cpp
17

Will do

clangd/Context.h
66

Conceptually, a Context is more convenient to use when it stores multiple values. This allows to put a bunch of things and assign meaning to Context (i.e., a Context for processing a single LSP request, global context). If Contexts were a linked list, the intermediate Contexts would be hard to assign the meaning to.

That being said, storage strategy for Contexts is an implementation detail and could be changed anytime. I don't have big preferences here, but I think that storing a linked list of maps has, in general, a better performance than storing a linked list.
And given that it's already there, I'd leave it this way.

80

I'd still get rid of it. The less implicit behavior we have, the better.
It does not seem hard to plumb the Context all the way through clangd currently. And it should not be too hard in the future.

I was asking myself multiple times whether we needed the global context in the first place while implementing it.
I think we should aim for avoiding global state altogether. That includes singletons, etc.

117

I think the current interface is simple enough and allows for both storage implementations. I'd really avoid providing an interface that ties us into a single storage implementation.

clangd/TypedValueMap.h
1 ↗(On Diff #124354)

Are Key<> types confusing? I really like the fact that I don't have to specify type information and it is carried in the Key<Type> to the get/emplace methods, i.e. I don't have to specify template arguments there.
I don't see how struct RequestId { ID int }; is clearer or shorter than Key<int> RequestID;. Again, are Key<Type> a confusing concept in the first place?

76 ↗(On Diff #124354)

With Key<A*> the function will be returning A**, which is inconvenient to use.
If we index by Type*, as opposed to Key<Type>, we'll get the same problems.

For example,

Key<Logger*> Key;

Logger ** Val = Map.get(Key); // Or Map.get<Logger*>();
if (!Val)
  return; // No logging is enabled

If (!*Val)
  return; // No logging is enabled, but the nullptr value was found in the map!

(*Val)->logImpl(....); // Log it already!

Note an extra if and extra dereference at the last line.
I do think it's a useful addition that outweighs the complexity.

ilya-biryukov edited edge metadata.Dec 5 2017, 8:40 AM

Addressed some review comments.

  • Removed global context.
  • Context now stores a shared_ptr to ContextData to handle lifetimes properly.
  • Added helper getters for some commonly used types to Context.
  • Removed PtrKey, this simplified code a bit.
ilya-biryukov marked 6 inline comments as done.Dec 5 2017, 8:56 AM

A second attempt at implementing the Contexts. Still missing a few comments, but hopefully ready for review.

clangd/Context.h
72

Remove the global context altogether.

80

Removed the global context.

117

Added a derive() method to the Context. It currently returns ContextBuilder, I still think it's nice to have the ContextBuilder and a separation between an immutable Context and a mutable ContextBuilder. Happy to chat on whether we want to remove it.

clangd/TypedValueMap.h
1 ↗(On Diff #124354)

Unfortunately, typeid does not with -fno-rtti, so there does not seem to be a way to significantly simplify the implementation (i.e. we'll need some form of Key<Type> anyway, at least internally).
Therefore I'd keep the current interface. Does that sound ok?

76 ↗(On Diff #124354)

Removed PtrKey, added a getPtr helper method to the ContextData instead (that does the same thing).
It's useful to have those easy-to-use helpers for common cases.

Totally agree that having a separate PtrKey for them is an overkill.

ilya-biryukov marked an inline comment as done.
  • Removed unused helpers from Context.h
  • Use assert instead of llvm_unreachable in getExisiting(Key<>)

There's a couple of points where we might be a bit stuck on complexity vs X tradeoffs (map + contextbuilder, and some of the convenience APIs).

Just want to mention: if I find these things complex, it doesn't mean I think they're bad code, or even that you should agree they're complex! Happy to pull in a third opinion if you think we can't or shouldn't live without them.

clangd/Context.h
12

This could use a bit more I think, e.g.

A context is an immutable container for per-request data that must be propagated through layers that don't care about it.
An example is a request ID that we may want to use when logging.

Conceptually, a context is a heterogeneous map<Key<T>, T>. Each key has 
an associated value type, which allows the map to be typesafe.

You can't add data to an existing context, instead you create a new immutable context derived from it with extra data added. When you retrieve data, the context will walk up the parent chain until the key is found.

Contexts should be:
 - passed by reference when calling synchronous functions
 - passed by value (move) when calling asynchronous functions. The callback will receive the context again.
 - copied only when 'forking' an asynchronous computation that we don't wait for

Some of this is on the class comment - this seems fine but the Context class should then be the first thing in the file.

26

IIUC, the only reason we expose separate Context and ContextData types is to give things like Span a stable reference to hold onto (ContextData*).

Could we use Context instead (a copy)? Then we incur shared_ptr overhead for span creation, but greatly simplify the interface.
If we want to avoid the overhead, the Span could maybe grab whatever it needs out of the context at construction time, for use in the destructor.

Either way, we should make sure Context is front-and-center in this header, and I hope we can hide ContextData entirely.

33

nit: const Key&, and below

33

this isn't usually how we define const-correctness for containers.
A const method shouldn't return a mutable reference to the contained object.

43

I'd prefer the standard name at for this function (assert vs throw being pretty minor I think), but YMMV

50

This is unused - remove until we need it?

62

This is only used where getExisting() would be better - remove until we need it?

66

With the new shared_ptr semantics:

Context D = move(C).derive(K1, V1).derive(K2, V2);

Is just as meaningful as

Context D = move(C).derive().add(K1, V1).add(K2, V2);

Yeah, the list of maps in an implementation detail. It's one that comes with a bunch of complexity (ContextBuilder and most of TypedValueMap). It really doesn't seem to buy us anything (the performance is both uninteresting and seems likely to be worse in this specific case with very few entries).

69

Nit: is destroyed after --> outlives?

If you're going to repeat that this is important, then say why :-)

83

If we're going to define this as a real class (rather than use/inherit shared_ptr), we should really be able to put the API on it directly (rather than require the user to ->)

97

If we discourage copies, you probably want a moving ContextBuilder derive() && overload

97

nit: const

140

nit: how do you feel about Context::empty()? it ties the name to the type more clearly, I think.

140

OOC: why reference and not a value?

143

I think we should spell this emptyCtx().derive().
It should be pretty rare to derive from the empty context in production code - and conceptually it's no different than any other use of the empty context, I think.

clangd/TypedValueMap.h
1 ↗(On Diff #124354)

Yes, I find Key<T> confusing compared to a type-based interface. I can live with it though - it's certainly conceptually nicer.

Oops, forgot one important question!

BTW, phabricator seems to have misaligned all my previous comments (you updated the diff before I submitted them I guess). Let me know if any of them don't make sense now!

clangd/Context.h
64

I think we should strongly consider calling the class Ctx over Context. It's going to appear in many function signatures, and I'm not sure the extra characters buy us anything considering the abbreviation is pretty unambiguous, and the full version isn't very explicit.

ioeric added inline comments.Dec 6 2017, 7:47 AM
clangd/Context.h
64

I have no opinion on the names... Just wondering: if the class is called Ctx, what name do you suggest to use for variables? With Context, you can probably use Ctx. But... LLVM variable naming is sad... :(

ilya-biryukov added inline comments.Dec 7 2017, 5:58 AM
clangd/Context.h
26

I have considered making Context the only used interface, but I hated the fact that each Span constructor would copy a shared_ptr. (And Spans can be pretty ubiquitous).
On the other hand, that should not happen when tracing is off, so we won't be adding overhead when Span are not used.
Will try moving ContextData into .cpp file, this should certainly be an implementation detail. And we can expose it later if we'll actually have a use-case for that (i.e. not copying shared_ptrs).

64

I'd also go with Context Ctx rather than Ctx C. I'd say Context is quite a small and concise name. It even reads better than Ctx to me.
But I'm fine either way, so will change the name to Ctx.

  • Made ContextData a private member of Context.
  • Added clone method.
  • Added forgotten header guard in TypedValueMap.h
  • Pass Key<> by const-ref instead of by-ref.
  • Pass Context by-const-ref instead of by-ref.
  • Made derive() const.
  • Added docs.
  • Return const Type* instead of Type* in map getters.
ilya-biryukov marked 4 inline comments as done.
  • Added a comment about the Parent vs Data lifetimes.
  • Rephrase the comment.
ilya-biryukov marked 2 inline comments as done.
  • Added r-value overload for derive().
  • Replaced emptyCtx with Context::empty
ilya-biryukov added inline comments.Dec 11 2017, 2:47 AM
clangd/Context.h
12

Done. It's in the class comment, but class is now the first thing in the file.
Thanks for making my life a bit easier and coming up with a doc :-)

26

Done. We now store a copied Context in the Spans.

33

Returning const references now.

43

We've discussed this offline, but just for the record:
I kinda like that getExisting is named after get. Also at usually throws, whereas our function only asserts (even though LLVM does not use execptions, we still have undefined behaviour vs crash in runtime).

140

nit: how do you feel about Context::empty()? it ties the name to the type more clearly, I think.

Done.

OOC: why reference and not a value?

Reference gives us just the right semantics with less overhead (no extra shared_ptr copies) when we need to pass empty context by reference (e.g., to sync functions like log).

143

I'd argue the separate function is more readable and saves us an extra lookup in the empty context for missing keys.
Given that emptyCtx is now Context::empty, maybe we should also change buildCtx to Context::build?

ilya-biryukov marked 3 inline comments as done.Dec 11 2017, 2:49 AM
ilya-biryukov added inline comments.Dec 11 2017, 2:55 AM
clangd/Context.h
66

The thing I like about it is that the Contexts are layered properly in a sense that there's a Context corresponding to the request, a Context corresponding to the forked subrequests, etc.
If we change the interface, we'll be creating a bunch of temporary Contexts that don't correspond to a nice meaningful abstraction (like request) in my head, even though we don't give those contexts any names.

I do agree we currently pay with some complexity for that. Though I'd argue it's all hidden from the users of the interface, as building and consuming contexts is still super-easy and you don't need to mention ContextBuilder or TypedValueMap. And the implementation complexity is totally manageable from my point of view, but I am the one who implemented it in the first place, so there's certainly a bias there.

Thanks for the changes. I don't think TypedValueMap/ContextBuilder pull their weight, but let's get another opinion on this.

clangd/Context.h
66

I don't see temporary unnamed Contexts being any different from temporary unnamed ContextBuilders.

But we've gone around on this point a bit, and this really seems to be a question of taste. @klimek, can we have a third opinion?

The options we're looking at are:

  • Context stores a map and a parent pointer. derive() returns a ContextBuilder used to create new contexts containing 0 or more new KV pairs. TypedValueMap stores the payloads.
  • Context stores a single KV pair and a parent pointer. derive(K, V) is used to create a new context with one new KV pair. A Key-pointer and AnyStorage in Context store the payloads, the rest of TypedValueMap goes away.
93

&&, not const&& :-)

Maybe add a trailing //takes ownership

143

I think this isn't a path we want to make smooth or obvious - usually what you want is to accept a context, derive from it, and use the result.

Embedders will need to create root contexts - we discussed LSPServer offline, and there it seems natural to own a background context and derive from it per handled call.

  • Use derive() && instead of derive() const &&.
ilya-biryukov added inline comments.Dec 11 2017, 4:53 AM
clangd/Context.h
93

Right. Copy-paste is gonna kill me some day.
Added the comment, too. Although that should be obvious from the && in the signature.

ilya-biryukov marked an inline comment as done.Dec 11 2017, 4:56 AM
  • Removed buildCtx(). Now Contexts can only be created using emptyCtx().derive()
ilya-biryukov marked 2 inline comments as done.Dec 11 2017, 5:05 AM
klimek added inline comments.Dec 11 2017, 7:32 AM
clangd/Context.h
66

I'd agree that Context::derive(K, V) would be simpler here, mainly because there is only one way to pass around contexts while we're building them up; specifically, there's no temptation to pass around ContextBuilders.

clangd/TypedValueMap.h
38 ↗(On Diff #126322)

Should we say "from Key<T>*" or &? If you say Key<T> I expect something that compares by value.

  • Removed ContextBuilder and TypedValueMap.
  • Updated the docs.
ilya-biryukov marked an inline comment as done.Dec 11 2017, 8:45 AM
ilya-biryukov added inline comments.
clangd/Context.h
66

Done. ContextBuilder and TypedValueMap are now removed.

sammccall accepted this revision.Dec 11 2017, 9:14 AM

Thanks, this looks like exactly the right amount of magic to me :-)

clangd/Context.h
92

as discussed offline, this could also return a value, which might make some use cases (testing) slightly cleaner. Up to you

110

nit: this is a for loop in disguise :-)

132

I'd find the interface (and in particular the error messages) here easier to understand if we took a Type by value, rather than having emplace semantics.

If this function took a Type, and TypedAnyStorage took a Type&&, the cost would be one extra move, which doesn't seem too bad.

Up to you, though. (If you change it, don't forget the other derive overload)

169

nit: now this is private it could just be called Data

170

Is this comment still true/relevant?
I thought the motivating case was Span, but Span now stores a copy of the parent pointer (and ContextData isn't accessible by it).

ilya-biryukov added inline comments.Dec 12 2017, 2:21 AM
clangd/Context.h
170

I'd argue we still want to keep this invariant, it gives a natural order of destruction.

ilya-biryukov marked 4 inline comments as done.
  • Change derive() signature to accept a value instead of the variadic arguments.
  • Rewrite a while loop into a for loop.
  • Return a value from Context::empty() instead of a reference.
  • Renamed ContextData to Data.
This revision was not accepted when it landed; it landed in state Needs Review.Dec 12 2017, 3:17 AM
This revision was automatically updated to reflect the committed changes.