Page MenuHomePhabricator

[clangd] Introduced a Context that stores implicit data

Authored by ilya-biryukov on Nov 27 2017, 4:50 AM.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ilya-biryukov edited edge metadata.Dec 5 2017, 8:40 AM

Addressed some review comments.

  • Removed global context.
  • Context now stores a shared_ptr to ContextData to handle lifetimes properly.
  • Added helper getters for some commonly used types to Context.
  • Removed PtrKey, this simplified code a bit.
ilya-biryukov marked 6 inline comments as done.Dec 5 2017, 8:56 AM

A second attempt at implementing the Contexts. Still missing a few comments, but hopefully ready for review.


Remove the global context altogether.


Removed the global context.


Added a derive() method to the Context. It currently returns ContextBuilder, I still think it's nice to have the ContextBuilder and a separation between an immutable Context and a mutable ContextBuilder. Happy to chat on whether we want to remove it.

1 ↗(On Diff #124354)

Unfortunately, typeid does not with -fno-rtti, so there does not seem to be a way to significantly simplify the implementation (i.e. we'll need some form of Key<Type> anyway, at least internally).
Therefore I'd keep the current interface. Does that sound ok?

76 ↗(On Diff #124354)

Removed PtrKey, added a getPtr helper method to the ContextData instead (that does the same thing).
It's useful to have those easy-to-use helpers for common cases.

Totally agree that having a separate PtrKey for them is an overkill.

ilya-biryukov marked an inline comment as done.
  • Removed unused helpers from Context.h
  • Use assert instead of llvm_unreachable in getExisiting(Key<>)

There's a couple of points where we might be a bit stuck on complexity vs X tradeoffs (map + contextbuilder, and some of the convenience APIs).

Just want to mention: if I find these things complex, it doesn't mean I think they're bad code, or even that you should agree they're complex! Happy to pull in a third opinion if you think we can't or shouldn't live without them.


This could use a bit more I think, e.g.

A context is an immutable container for per-request data that must be propagated through layers that don't care about it.
An example is a request ID that we may want to use when logging.

Conceptually, a context is a heterogeneous map<Key<T>, T>. Each key has 
an associated value type, which allows the map to be typesafe.

You can't add data to an existing context, instead you create a new immutable context derived from it with extra data added. When you retrieve data, the context will walk up the parent chain until the key is found.

Contexts should be:
 - passed by reference when calling synchronous functions
 - passed by value (move) when calling asynchronous functions. The callback will receive the context again.
 - copied only when 'forking' an asynchronous computation that we don't wait for

Some of this is on the class comment - this seems fine but the Context class should then be the first thing in the file.


IIUC, the only reason we expose separate Context and ContextData types is to give things like Span a stable reference to hold onto (ContextData*).

Could we use Context instead (a copy)? Then we incur shared_ptr overhead for span creation, but greatly simplify the interface.
If we want to avoid the overhead, the Span could maybe grab whatever it needs out of the context at construction time, for use in the destructor.

Either way, we should make sure Context is front-and-center in this header, and I hope we can hide ContextData entirely.


nit: const Key&, and below


this isn't usually how we define const-correctness for containers.
A const method shouldn't return a mutable reference to the contained object.


I'd prefer the standard name at for this function (assert vs throw being pretty minor I think), but YMMV


This is unused - remove until we need it?


This is only used where getExisting() would be better - remove until we need it?


With the new shared_ptr semantics:

Context D = move(C).derive(K1, V1).derive(K2, V2);

Is just as meaningful as

Context D = move(C).derive().add(K1, V1).add(K2, V2);

Yeah, the list of maps in an implementation detail. It's one that comes with a bunch of complexity (ContextBuilder and most of TypedValueMap). It really doesn't seem to buy us anything (the performance is both uninteresting and seems likely to be worse in this specific case with very few entries).


Nit: is destroyed after --> outlives?

If you're going to repeat that this is important, then say why :-)


If we're going to define this as a real class (rather than use/inherit shared_ptr), we should really be able to put the API on it directly (rather than require the user to ->)


If we discourage copies, you probably want a moving ContextBuilder derive() && overload


nit: const


nit: how do you feel about Context::empty()? it ties the name to the type more clearly, I think.


OOC: why reference and not a value?


I think we should spell this emptyCtx().derive().
It should be pretty rare to derive from the empty context in production code - and conceptually it's no different than any other use of the empty context, I think.

1 ↗(On Diff #124354)

Yes, I find Key<T> confusing compared to a type-based interface. I can live with it though - it's certainly conceptually nicer.

Oops, forgot one important question!

BTW, phabricator seems to have misaligned all my previous comments (you updated the diff before I submitted them I guess). Let me know if any of them don't make sense now!


I think we should strongly consider calling the class Ctx over Context. It's going to appear in many function signatures, and I'm not sure the extra characters buy us anything considering the abbreviation is pretty unambiguous, and the full version isn't very explicit.

ioeric added inline comments.Dec 6 2017, 7:47 AM

I have no opinion on the names... Just wondering: if the class is called Ctx, what name do you suggest to use for variables? With Context, you can probably use Ctx. But... LLVM variable naming is sad... :(

ilya-biryukov added inline comments.Dec 7 2017, 5:58 AM

I have considered making Context the only used interface, but I hated the fact that each Span constructor would copy a shared_ptr. (And Spans can be pretty ubiquitous).
On the other hand, that should not happen when tracing is off, so we won't be adding overhead when Span are not used.
Will try moving ContextData into .cpp file, this should certainly be an implementation detail. And we can expose it later if we'll actually have a use-case for that (i.e. not copying shared_ptrs).


I'd also go with Context Ctx rather than Ctx C. I'd say Context is quite a small and concise name. It even reads better than Ctx to me.
But I'm fine either way, so will change the name to Ctx.

  • Made ContextData a private member of Context.
  • Added clone method.
  • Added forgotten header guard in TypedValueMap.h
  • Pass Key<> by const-ref instead of by-ref.
  • Pass Context by-const-ref instead of by-ref.
  • Made derive() const.
  • Added docs.
  • Return const Type* instead of Type* in map getters.
ilya-biryukov marked 4 inline comments as done.
  • Added a comment about the Parent vs Data lifetimes.
  • Rephrase the comment.
ilya-biryukov marked 2 inline comments as done.
  • Added r-value overload for derive().
  • Replaced emptyCtx with Context::empty
ilya-biryukov added inline comments.Dec 11 2017, 2:47 AM

Done. It's in the class comment, but class is now the first thing in the file.
Thanks for making my life a bit easier and coming up with a doc :-)


Done. We now store a copied Context in the Spans.


Returning const references now.


We've discussed this offline, but just for the record:
I kinda like that getExisting is named after get. Also at usually throws, whereas our function only asserts (even though LLVM does not use execptions, we still have undefined behaviour vs crash in runtime).


nit: how do you feel about Context::empty()? it ties the name to the type more clearly, I think.


OOC: why reference and not a value?

Reference gives us just the right semantics with less overhead (no extra shared_ptr copies) when we need to pass empty context by reference (e.g., to sync functions like log).


I'd argue the separate function is more readable and saves us an extra lookup in the empty context for missing keys.
Given that emptyCtx is now Context::empty, maybe we should also change buildCtx to Context::build?

ilya-biryukov marked 3 inline comments as done.Dec 11 2017, 2:49 AM
ilya-biryukov added inline comments.Dec 11 2017, 2:55 AM

The thing I like about it is that the Contexts are layered properly in a sense that there's a Context corresponding to the request, a Context corresponding to the forked subrequests, etc.
If we change the interface, we'll be creating a bunch of temporary Contexts that don't correspond to a nice meaningful abstraction (like request) in my head, even though we don't give those contexts any names.

I do agree we currently pay with some complexity for that. Though I'd argue it's all hidden from the users of the interface, as building and consuming contexts is still super-easy and you don't need to mention ContextBuilder or TypedValueMap. And the implementation complexity is totally manageable from my point of view, but I am the one who implemented it in the first place, so there's certainly a bias there.

Thanks for the changes. I don't think TypedValueMap/ContextBuilder pull their weight, but let's get another opinion on this.


I don't see temporary unnamed Contexts being any different from temporary unnamed ContextBuilders.

But we've gone around on this point a bit, and this really seems to be a question of taste. @klimek, can we have a third opinion?

The options we're looking at are:

  • Context stores a map and a parent pointer. derive() returns a ContextBuilder used to create new contexts containing 0 or more new KV pairs. TypedValueMap stores the payloads.
  • Context stores a single KV pair and a parent pointer. derive(K, V) is used to create a new context with one new KV pair. A Key-pointer and AnyStorage in Context store the payloads, the rest of TypedValueMap goes away.

&&, not const&& :-)

Maybe add a trailing //takes ownership


I think this isn't a path we want to make smooth or obvious - usually what you want is to accept a context, derive from it, and use the result.

Embedders will need to create root contexts - we discussed LSPServer offline, and there it seems natural to own a background context and derive from it per handled call.

  • Use derive() && instead of derive() const &&.
ilya-biryukov added inline comments.Dec 11 2017, 4:53 AM

Right. Copy-paste is gonna kill me some day.
Added the comment, too. Although that should be obvious from the && in the signature.

ilya-biryukov marked an inline comment as done.Dec 11 2017, 4:56 AM
  • Removed buildCtx(). Now Contexts can only be created using emptyCtx().derive()
ilya-biryukov marked 2 inline comments as done.Dec 11 2017, 5:05 AM
klimek added inline comments.Dec 11 2017, 7:32 AM

I'd agree that Context::derive(K, V) would be simpler here, mainly because there is only one way to pass around contexts while we're building them up; specifically, there's no temptation to pass around ContextBuilders.

38 ↗(On Diff #126322)

Should we say "from Key<T>*" or &? If you say Key<T> I expect something that compares by value.

  • Removed ContextBuilder and TypedValueMap.
  • Updated the docs.
ilya-biryukov marked an inline comment as done.Dec 11 2017, 8:45 AM
ilya-biryukov added inline comments.

Done. ContextBuilder and TypedValueMap are now removed.

sammccall accepted this revision.Dec 11 2017, 9:14 AM

Thanks, this looks like exactly the right amount of magic to me :-)


as discussed offline, this could also return a value, which might make some use cases (testing) slightly cleaner. Up to you


nit: this is a for loop in disguise :-)


I'd find the interface (and in particular the error messages) here easier to understand if we took a Type by value, rather than having emplace semantics.

If this function took a Type, and TypedAnyStorage took a Type&&, the cost would be one extra move, which doesn't seem too bad.

Up to you, though. (If you change it, don't forget the other derive overload)


nit: now this is private it could just be called Data


Is this comment still true/relevant?
I thought the motivating case was Span, but Span now stores a copy of the parent pointer (and ContextData isn't accessible by it).

ilya-biryukov added inline comments.Dec 12 2017, 2:21 AM

I'd argue we still want to keep this invariant, it gives a natural order of destruction.

ilya-biryukov marked 4 inline comments as done.
  • Change derive() signature to accept a value instead of the variadic arguments.
  • Rewrite a while loop into a for loop.
  • Return a value from Context::empty() instead of a reference.
  • Renamed ContextData to Data.
This revision was not accepted when it landed; it landed in state Needs Review.Dec 12 2017, 3:17 AM
Closed by commit rCTE320468: [clangd] Introduced a Context that stores implicit data (authored by ibiryukov, committed by ). · Explain Why
This revision was automatically updated to reflect the committed changes.