This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Config: Fragments and parsing from YAML
ClosedPublic

Authored by sammccall on Jun 23 2020, 8:23 AM.

Details

Summary

This is a piece from the design in https://tinyurl.com/clangd-config
https://reviews.llvm.org/D82335 is a draft with all the bits linked together.
It doesn't do anything yet, it's just a library + tests.

It deliberately implements a minimal set of actual configuration options.

Diff Detail

Event Timeline

sammccall created this revision.Jun 23 2020, 8:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2020, 8:23 AM
kadircet added inline comments.Jun 23 2020, 3:07 PM
clang-tools-extra/clangd/ConfigFragment.h
9

i think this paragraph belongs to Config.h

14

s/which is models/which models/

16

again i don't think these are relevant here.

maybe provide more details about any new config option should start propagating from here, as this is the closest layer to user written config,

61

what about fromYAML and returning vector<const Fragment> ?

65

why the bundling?

69

maybe invalid/absent rather than ignored (or change it to say should be ignored) ? I am assuming this is referring to locations of config parameters like Adds and conditions.

73

what is the use of this ?

75

why not make this const ? i don't think it makes sense to modify these after creation.

77

comments?

78

some comments? especially around fragment applies to file matching all (or any) of the pathmatches

81

HasUnrecognizedCondition ?

also default init to true maybe?

85

some comments.

I am not sure if putting Fragment suffix to all of these makes sense, as they already reside inside Fragment. Maybe section or block ? (same goes for ConditionFragment)

87

maybe make this an llvm:Optional too. Even though emptiness of Add would be OK in this case, for non-container "blocks" we might need to introduce an optional, and homogeneity would be nice when it happens.

kadircet added inline comments.Jun 24 2020, 4:23 AM
clang-tools-extra/clangd/ConfigYAML.cpp
29

a comment for this buf, especially the reason why it's a member rather than a function-local when needed. (I suppose to get rid of const/dest penalties?)

40

It is a little bit subtle that "at most once" execution of this handler is assured by DictParser. Can we put some comments explaining it to either DictParser itself or to handle member ?

74

maybe a comment for these two, suggesting the latter is used for unknown keys.

and also mention it is handlers responsibility emit any diagnostics in case of failures?

84

maybe assert on duplicate keys in debug builds?

100

i suppose yamlparser has already emitted a diagnostic by then ? can you add a comment about it if that's the case.

and if not, shouldn't we emit something about a broken stream?

same goes for KV.getValue() below.

105

maybe Ignoring duplicate key instead of just Duplicate key ?

136

s/S->getValue(Buf)/BS->getValue()/ (or change auto *BS to auto *S)

and a test

150

shouldn't we error/warn out if not ?

161

do we really want to return a boolean here (and not at the warning ?)

I can see that it's nice to return error("xxx") but it looks surprising as we are not returning the error but just false.
I would make this void, up to you though.

202

*sigh*

what a mess, Scanner in YamlParser already adds this buffer. It is unclear why though, as it adds a non-owning buffer and that buffer is never accessed. (well apart from SMLoc to Buffer conversions when emitting diags)
It rather makes use of pointers to the underlying data, obtained before adding that buffer. So it is the caller keeping the data alive anyways.

should we consider having an entry point that takes an SMLoc and a SourceMgr for yaml::Stream instead of this? That way we can add the owning entry to sourcemanager and start the stream at beginning of that buffer.

i am afraid this might bite us in the future, as SM is exposed publicly and one might try to iterate over buffers, which might come as a shock. we can postpone until that happens though, so feel free to leave it as it is.

Thanks! Looks great, I've finally managed to finish all my iterations :D

Apart from the comments(IIRC which are mostly minor), it felt like there were a few lines that aren't clang-format'd. Please make sure to run clang-format on the files at some point.

clang-tools-extra/clangd/unittests/ConfigYAMLTests.cpp
58

is there an explanation to what these ranges are ?

All I could find is somewhere in the implementation making sure these are relative to the D.getLineNo and nothing else :/

73

why no print others ?

Kind: D.Message @ Pos - Range

133

i think having a full matcher(i.e with kind, pos and ranges) might be more expressive here, up to you though

sammccall updated this revision to Diff 273254.Jun 25 2020, 1:50 AM
sammccall marked 38 inline comments as done.

Address review comments

Thanks!

Apart from the comments(IIRC which are mostly minor), it felt like there were a few lines that aren't clang-format'd. Please make sure to run clang-format on the files at some point.

git-clang-format thinks it's clean, so please point out any examples you see!

clang-tools-extra/clangd/ConfigFragment.h
9

This first sentence does, and I'll put it there too, but I think one redundant sentence to establish the context is reasonable.

The next three lines belong in this file because they describe how users control config (which they do through ConfigFragment, not through Config).

16

again i don't think these are relevant here.

I think how this code relates to the rest of the system is a pretty important thing for documentation to cover! I wish docs in general did a better job of this.
Relying on xrefs in the code is pretty noisy and low-level, and guessing at filenames doesn't scale well.

maybe provide more details about any new config option should start propagating from here, as this is the closest layer to user written config,

Yeah - it depends whether you're looking at "here is a feature I'd like to make configurable" vs "here is some configuration syntax I want to support".
I've moved it here, but I'm not 100% sure it's the right call.

61

what about fromYAML

I can live with it if you like, but I prefer parse:

  • Fragment::fromYAML reads funny as it returns multiple fragments in general
  • parse describes the action more clearly and so hints toward thinking about error handling etc

returning vector<const Fragment> ?

why? the resulting vector is slower (has to copy instead of move on reallocation).
It prevents us from moving from the fragment when we compile it (which we do in some places)

65

Oops, this is so I could give them a collective comment, and I forgot to do it...

The idea is everything else represents the directly-user-specified "body" of the configuration, but these fields are different. (The associated directory also goes here, though wasn't needed in this patch).
I wanted to wall these off a bit as the outer struct will gain many fields and so this may become less obvious. (Also makes it easier to skip over if we're generating docs etc)

69

Changed to "should be ignored". guaranteeing that the SMRanges is stronger but not actually useful.

73

There may be multiple fragments in a file.

When tracking how fragments are used in the logs, there should be enough information to identify the particular fragment, without having to guess.

  • during parsing, we can log in some source-specific way
  • in the Fragment form, we can log this location to identify the fragment
  • during compilation, we log this location and the pointer address of the new CompiledFragment
  • subsequently we can just log the CompiledFragment address
75

const members are a pain, as they prevent moving the object (and vectors of them are doing copies, etc). For that reason I generally don't find "I don't semantically need to modify this" a good enough reason to mark a member const.

Adding a constructor muddles the idea of a dumb struct and presents an inconsistent API - the other fields *also* don't really change during the lifetime (parseYAML() is basically a big constructor)

77

Whoops, done. There's an argument to be made that we shouldn't maintain docs in two places, and the user docs should be canonical.
Something to think about later, maybe we can generate them.

I'm always unsure whether to put the doc for "blocks" like this on the struct or the field...

81

Renamed. I guess you meant false here? (an empty condition should not have this set)

87

I'd rather push this the other way: the presence of an empty block like this shouldn't have any semantic effect - I think we can be consistent about this and it helps the schema feel regular/extensible/predictable.

Condition was the only tricky case, but I don't think we need it, so I've removed the Optional on the condition instead. it's an AND, so an empty set of conditions evaluates to "unconditionally true" anyway.

clang-tools-extra/clangd/ConfigYAML.cpp
29

Moved it inline instead.

40

Ooh, hadn't noticed the subtlety with emplace :-)
Emplace is gone since condition is no longer optional.
Added a comment too.

74

maybe a comment for these two, suggesting the latter is used for unknown keys.

These are private fields, the public trivial setters are commented, I don't think commenting them again adds value.

and also mention it is handlers responsibility emit any diagnostics in case of failures?

Done.

136

Nice catch. This would be a good clang-tidy check I think... variable initialized in an if and then used in an else.

150

we do: scalarValue warns if it returns None and is documented to do so.

161

there's just one callsite, made void for now, can revisit with more callsites.

202

I can't think of any reason we'd iterate over the buffers, as there's only ever one real buffer and it's identifiable as the "main file" in the usual way.
So I'd rather take the horrible API if/when it actually becomes a problem.

clang-tools-extra/clangd/unittests/ConfigYAMLTests.cpp
58

These are ranges to highlight (e.g. in console output). SourceMgr/SMDiag is modeled after clang's SourceManager/Diagnostic, where this stuff is a bit better documented.

133

Done, though rather separate matchers as otherwise it's too much work to get decent error messages.

kadircet marked 5 inline comments as done.Jun 25 2020, 8:41 AM

LGTM with couple of nits.

Regarding clang-format, I was mostly confused by template <typename T> class Located { being on a single line, but apparently that's the style :D

clang-tools-extra/clangd/ConfigFragment.h
61

I can live with it if you like, but I prefer parse:

I was mainly unhappy about it because parse doesn't reflect the construction bit of the process, but it is clear from the signature and you are right about returning multiple fragments contradicting with the idea of from. I am more inclined towards from but I am fine with parse too if it feels more natural to you.

why? the resulting vector is slower (has to copy instead of move on reallocation).

It was to signal the fact that these correspond to some immutable input received, but I giving up on move semantics sounds like a nasty consequence. So let's keep it that way (unless you want to hide the members and provide only const accessors, which is some plumbing for a simple struct like this and might hinder readability)

73

right, I was thinking that we already have location information available for the start of the fragment during parsing and wasn't sure how useful it would be in later on stages as we know the exact location of the relevant entries (keys, scalars etc.) for logging.

I wasn't against the idea of storing it, just wanted to learn if it has any significant importance for the upcoming stages. thanks for the info!

75

this was in parallel with the idea of returning vector<const Fragment>. so agreed.

77

I suppose it makes sense to put the part starting with /// Conditions based on a file's path use the following form: above the field rather than the struct, especially if we are planning to generate docs.

also:
s/ConfigFragment/Fragment/

81

I was actually suggesting true to make sure this is explicitly set by the parser in case of success and not forgotten when something slips away. but not that important.

87

sure SG, I was rather afraid of introducing an int option one day, then we might need to signal it's existence with an Optional but one might also argue that we would likely have a "default" value for such options, hence instead of marking it as None we could just make use of the default in absence of that option.

clang-tools-extra/clangd/ConfigYAML.cpp
136

yup sounds good, also i would change the else if to an if, as we are returning anyways (which might be another tidy check)

150

right but this function still returns Result and not None

clang-tools-extra/clangd/unittests/ConfigYAMLTests.cpp
58

right, sorry i forgot to mention in the first comment. I was rather afraid of Range being relative to start of line, but being contained in another line. (due to the usage on start.character and end.character down below)

kadircet accepted this revision.Jun 25 2020, 8:41 AM
This revision is now accepted and ready to land.Jun 25 2020, 8:41 AM
sammccall marked 9 inline comments as done.Jun 25 2020, 1:55 PM

Thanks for the careful review!

clang-tools-extra/clangd/ConfigFragment.h
73

Yeah, storing the start of the fragment given a slightly more correct location to represent the fragment itself, but more importantly we don't have to hunt around the config to find something that has a location.

77

Yeah - the counter-argument is that if you *are* reading the code then having the general documentation above the specific stuff reads better. And we can teach a doc generator/extractor to grab the right text easily enough. I'll leave it for now, I think.

s/ConfigFragment/Fragment/

Doh, I did update this everywhere, but apparently not in my brain, so it keeps coming back...

81

The problem is the only sensible implementation for a parser is to set it to false again, then set it to true once you hit an unknown key.
Particularly now that condition is not Optional. I think the invariant that a default-initialized Fragment corresponds to an empty document is more valuable.

87

Oh definitely I think we'll want Optional for singular leaf fields, like Optional<bool> BackgroundIndexBlock::Enabled or whatever.
You want a tristate of yes/no/inherit-existing and Optional is the best fit for that.

I meant specifically for *these "blocks" that act as namespaces*, we avoid giving side-effects to the presence/absence, so Optional isn't needed to encode some user intent.
(Not sure how to formalize this namespace idea, it's something struct-like that appears as a singular field in something struct-like)?

clang-tools-extra/clangd/ConfigYAML.cpp
150

Yeah, we ignore the illegal value and continue.
We could make this a hard error, but that is commitment that where we parse a scalar today we'll never accept anything else in future.
Whereas we might choose to allow some other structure in future (without picking a new name for the key). LSP is full of this...
On the other hand not seeing a dict where you expect one is a hard error, because there are fewer reasons to switch from a dict to something else.

clang-tools-extra/clangd/unittests/ConfigYAMLTests.cpp
58

Oh yeah that threw me for a loop too. I think I got the semantics here right, in any case it's only tests (we don't own any of the code *generating* the ranges, and we're not consuming it in production code yet).

This revision was automatically updated to reflect the committed changes.
sammccall marked an inline comment as done.