This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Config: compile Fragment -> CompiledFragment -> Config
ClosedPublic

Authored by sammccall on Jun 25 2020, 4:50 PM.

Details

Diff Detail

Event Timeline

sammccall created this revision.Jun 25 2020, 4:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 25 2020, 4:50 PM
hokein added inline comments.Jun 29 2020, 12:43 AM
clang-tools-extra/clangd/CMakeLists.txt
41

we have a few config-related files now, I wonder would it make sense to create a config/ subdir?

clang-tools-extra/clangd/ConfigCompile.cpp
48

this is only used in method compileRegex, looks like can be moved to a local variable? is there life-time issue?

97

this member seems unused.

clang-tools-extra/clangd/ConfigProvider.h
33

nit: I find the name Params is too general, it implies less information. When I read the code using Params, I have to go-to-definition to see what it means.

if we only have a single member in this struct, maybe just use it directly.

41

I might miss the context, where is the provider, not added yet?

48

Would it be more nature to have a function compile(or so) to do the actual compile (Fragment -> CompiledFragment) rather than doing it in a constructor?

clang-tools-extra/clangd/unittests/ConfigCompileTests.cpp
41

my first impression was that each // XXX is a separated test, but it turns out not, the tests here seem to be stateful -- Frag keeps being modified. I think it is intentional, but it is hard for readers to track given that the code is long.

sammccall marked 10 inline comments as done.Jun 29 2020, 2:29 PM
sammccall added inline comments.
clang-tools-extra/clangd/CMakeLists.txt
41

It seems marginal to me - could go either way. We have a small handful (2 headers and 3 impl files). There may be 1-2 more, but I'm not sure yet. (config-files-on-disk, and json->fragment).
So 6-7 files... smaller than the other subdirs, but not tiny.

There's also the fact that config.h/cpp is not like the others: all features (not just infra) depend on this, it's not in the config:: namespace etc. So should it really be in the subdir?

Mind if we hold off on this until at least the scope of D82335 has landed? I think it would be awkward to do this move halfway through this patchset, but fairly easy to do at any point that things are stable.

clang-tools-extra/clangd/ConfigCompile.cpp
97

it's used on line 52 diag(Error, ...).

I wouldn't bother extracting it except there should be a bunch more places where errors are emitted, and keeping them readable is worthwhile IME.

clang-tools-extra/clangd/ConfigProvider.h
33

Agree with the vague name.
In most contexts, the name is config::Params which is at least a bit more specific.

if we only have a single member in this struct, maybe just use it directly.

I don't like this, though - there are likely to be more things here in the future (e.g. platform), and passing around a string is much less clear as to intent. So any idea about a name for this struct?

What about config::Environment or config::Env? still too vague?

41

Right, D82335 has a rought draft of everything together (it's called ConfigProvider in that patch).

I think it's less churn-y to add the documentation as the classes are added, even if it means temporarily referring to something that doesn't exist yet.

48

I'm not sure. The idea that the input/result is Fragment/CompiledFragment, and that the compilation cannot fail, suggests to me that a constructor is OK here.
On the other hand, the DiangosticCallback param (which we don't keep a reference to) is a bit weird.

So I don't really feel strongly about it one way or the other, happy to change it if you do think it would be significantly better.

clang-tools-extra/clangd/unittests/ConfigCompileTests.cpp
41

Agree. Made this test reset fragment state after each group of assertions.

sammccall updated this revision to Diff 274237.Jun 29 2020, 2:29 PM
sammccall marked 3 inline comments as done.

Address some review comments

hokein added inline comments.Jun 30 2020, 2:54 AM
clang-tools-extra/clangd/CMakeLists.txt
41

it seems marginal to me - could go either way. We have a small handful (2 headers and 3 impl files). There may be 1-2 more, but I'm not sure yet. (config-files-on-disk, and json->fragment).
So 6-7 files... smaller than the other subdirs, but not tiny.

if we end up with 6-7 files, I'd prefer to create a subdirectory, which make our source structure a bit tidy, our clangd/ dir already has a lot of source files.

Mind if we hold off on this until at least the scope of D82335 has landed? I think it would be awkward to do this move halfway through this patchset, but fairly easy to do at any point that things are stable.

sure, I didn't mean we should do this in this patch.

clang-tools-extra/clangd/ConfigCompile.cpp
66

I think if this case happened, we might just bail out directly -- adding the regex conditions seems unnecessary, we never execute them on Line 127 (but we might still need computing the regex for diagnostics).

and I'm not sure the motivation of using vector for Conditions and Apply, it looks like Apply is just 1 element (if we do the bailout for conditions, Conditions will be at most 1 element).

I feel like using a single unique_function for Conditions and Apply might be a better fit with Fragment. There is a comment in ConfigFragment.h:

Each separate condition must match (combined with AND).
When one condition has multiple values, any may match (combined with OR).

so my reading is that:

  • Fragment and CompiledFragment is a 1:1 mapping
  • A Fragment::Condition represents a single condition, so the AND match applies for different Fragments, which is in ConfigProvider?
  • A single condition can have multiple values (PathMatch), we use OR match

Is that correct?

85

nit: use early return.

97

Ah, I see. I must have missed that.

btw, could you spell out the auto type for Error? I believe it is DiagKind, and it would be more clear that it is used in the first parameter of diag (given that they are close).

clang-tools-extra/clangd/ConfigProvider.h
33

not better idea. config::Env doesn't seem to be much better... and the signature apply(const Params &P, Config &C) seems neat. let's keep it as-is.

The comment of Path is a bit vague, it would be nice to clarify what kind of the file we're targeting to? the config file or the .cpp source file?

48

You point is fair.
I'm not sure using compile is *significantly* better, but it is better IMO -- conceptually, the class name CompiledFragment implies that there should be a compile API (this was my impression when reading the code at the first time)

it might be ok to keep as-is.

clang-tools-extra/clangd/unittests/ConfigCompileTests.cpp
32

nit: compileAndApply?

sammccall marked 8 inline comments as done.Jun 30 2020, 7:21 AM
sammccall added inline comments.
clang-tools-extra/clangd/CMakeLists.txt
41

if we end up with 6-7 files, I'd prefer to create a subdirectory

SGTM. I'll carry on with this structure for now, and move files after - please remind me if I forget!

clang-tools-extra/clangd/ConfigCompile.cpp
66

I think if this case happened, we might just bail out directly -- adding the regex conditions seems unnecessary, we never execute them on Line 127 (but we might still need computing the regex for diagnostics).

and I'm not sure the motivation of using vector for Conditions and Apply, it looks like Apply is just 1 element (if we do the bailout for conditions, Conditions will be at most 1 element).

So the explanation for both of these things is that this file is going to grow a lot, and in particular these sections handling PathMatch etc are going to occur again and again, so we need a scalable pattern. I've deliberately limited the scope of the first patchset to only support one condition, one setting to apply etc, but you have to imagine there are likely to be 5 conditions and maybe 25 settings.

So with that in mind:

  • yes, actually saving the conditions isn't necessary, but we do need to evaluate them for side-effects (diagnostics) and it's simpler not to keep track of them. Optimizing performance of broken configs is not a big win :-)
  • it's marginally simpler to have only one Apply function, but much simpler if separate parts of compile() can push settings independently. Same goes for condition in fact...

Fragment and CompiledFragment is a 1:1 mapping

Yes. I think this is somewhere where the naming is helpful at least :-)

A Fragment::Condition represents a single condition, so the AND match applies for different Fragments, which is in ConfigProvider?

The AND match applies to the multiple Conditions in a single CompiledFragment. Each fragment only considers its own conditions.
So given:

If:
  Platform: win
  PathMatch: [foo1/.*, foo2/.*]
CompileFlags:
  Add: -foo
---
CompileFlags:
  Add: -bar

We end up with... one Provider, suppling two CompiledFragments (each compiled from a separate Fragment).

The first CompiledFragment has two Conditions, conceptually:

  • Params.Platform == "win"
  • Regex("foo1/.*").matches(Params.File) || Regex("foo2/.*").matches(Params.File)

The second CompiledFragment has no Conditions. Each fragment has one Apply function.

The CompiledFragments are applied independently. They each consider their own Conditions, ANDing them together.

So CompiledFragment #1 checks both conditions. If any fails, it bails out. Otherwise it runs its Apply function, adding -foo.

Then CompiledFragment #2 runs. It has no conditions, so never bails out. It adds -bar.

85

Early return isn't a pattern that's compatible with lots of optional pieces - it doesn't scale as this file grows.

clang-tools-extra/clangd/ConfigProvider.h
48

Argh, you convinced me this would be better, but I can't come up with a form I like.

  • static CompiledFragment::compile repeats itself in a weird way. But any other option requires messing around with friend functions, which feels off
  • Fragment::compile is maybe the most natural, but the dependencies are backwards (needs the CompiledFragment type, it gets implemented in the wrong file...).
  • standalone compile() is maybe the most palatable, but seems a little undiscoverable
  • having a function return CompiledFragment means that putting it into a shared_ptr is annoying work, and breaks our pointer-logging trick, but having the factory function return a pointer also seems odd.

I'm starting to think maybe the CompiledFragment class shouldn't be public at all, and we should just using CompiledFragment = unique_function<bool(const Params&, Config&) const>. WDYT?

sammccall updated this revision to Diff 274470.Jun 30 2020, 7:23 AM
sammccall marked an inline comment as done.

address comments

sammccall marked an inline comment as done.Jun 30 2020, 7:29 AM
sammccall added inline comments.
clang-tools-extra/clangd/ConfigProvider.h
48

I'm starting to think maybe the CompiledFragment class shouldn't be public at all

Thinking more about this:

  • making it std::function<...> instead lets us wrap the shared_ptr inside which makes a few APIs a bit cleaner
  • as its an interface type, this leaves config providers free to use other ways of manipulating config rather than going via fragments, which is potentially nice for embedders
  • the main downside is we're committing to not exposing any more structure (e.g. querying a fragment for which directory it applies to)

starting to quite like this idea.

sammccall updated this revision to Diff 274503.Jun 30 2020, 8:41 AM

CompiledFragment becomes a std::function, compile interface moved to Fragment.h

hokein added inline comments.Jun 30 2020, 1:25 PM
clang-tools-extra/clangd/ConfigCompile.cpp
41

nit: I'd use a more verbose name, CompiledFragmentImpl.

47

maybe Impl::applying config ... to allow better debug logging?

this reminds us: since the config can easily become a monster, think about multiple config files from different sources (user config file, project config file, LSP, flags) are applied together, we might need a tracing mechanism (at least some useful logs) to help us diagnose a final complicated config. no need to do anything in this patch, just something we should keep in mind.

66

oh, thanks for the explanation, this helps me a lot to understand the code.

so the member Condition in Fragment will be enhanced to something like std::vector<XXX> Conditions in the future? If so, can we defer that for CompiledFragment as well?

Since Fragment and CompiledFragment is 1:1 mapping, I think it is important to keep their implementation aligned, it'd help a lot for readers to understand the code. The condition logic (AND, OR) is subtle and complicated, I was reading this part of code back and forth, still inferred it in an incorrect way.

127

nit: maybe name it ConfigFile, SourceFile sounds like a .cpp file.

clang-tools-extra/clangd/ConfigFragment.h
35 ↗(On Diff #274503)

the dependence ( ConfigFragment.h depends on ConfigProvider.h) seems a bit counterintuitive...

81 ↗(On Diff #274503)

this API is neat, thanks!

clang-tools-extra/clangd/ConfigProvider.h
48

SG

sammccall updated this revision to Diff 274623.Jun 30 2020, 2:54 PM
sammccall marked 13 inline comments as done.

Address comments

clang-tools-extra/clangd/ConfigCompile.cpp
47

I've changed these from config::Fragment to "Config fragment" to be less misleading. I'm not sure sprinkling in internal class names helps much vs having them be clear sentences, I always end up searching for the log string to see exactly where it comes from anyway.

This is dlog, it would be nice to make it vlog but I think it's just way too spammy, we'll get several of these for every file that's opened and background-indexed...

we might need a tracing mechanism (at least some useful logs)

My secret plan is to have a clangd flag like clangd --check <filename> that will *not* run the language server, but instead dump a bunch of information about how clangd would process that file (stringing together CDB, config, query-driver, diagnostics...). This would miss config-over-LSP but in every other respect seems much simpler than trying to find ways to extract this stuff live and get it in front of the user somehow.

66

so the member Condition in Fragment will be enhanced to something like std::vector<XXX> Conditions in the future?

Ah, not quite...
Fragment is supposed to closely follow the YAML format, which is:
If: { PathMatch: foo, Platform: bar }
not
If: [ {PathMatch: foo}, {Platform: bar} ]
That is, the partitioning of parts of the If block into Conditions is implicit rather than explicit. (It's basically partitioning on the map keys, but maybe there'll be exceptions).

The subtle difference between Fragment::Condition and CompiledFragmentImpl::Condition is confusing though. I think renaming the former to Fragment::If is much clearer (not sure why I didn't name it that in the first place).

Since Fragment and CompiledFragment is 1:1 mapping, I think it is important to keep their implementation aligned

I think the fragment and the fragmentcompiler are pretty well aligned, but I don't think duplicating the fragment structure in the final CompiledFragmentImpl is a good idea.

The reason is captured pretty well in the essay "Parse, don't validate". Through compilation here we want to examine the config and establish that we can apply this configuration efficiently, and without any more errors. The type-driven way to do that is to emit a function that applies the configuration and whose signature doesn't allow for errors.

https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/

clang-tools-extra/clangd/ConfigFragment.h
35 ↗(On Diff #274503)

Agree, it seems backwards - this is the one thing that seemed cleaner about the old factoring.
But it does appear that everything is in the right file, the dependency is needed, and a (circular) dependency in the other direction won't be needed so... maybe we can live with it?

The way I think of it is that ConfigProvider is defining an entirely abstract interface that ConfigFragment is implementing.

hokein accepted this revision.Jun 30 2020, 11:52 PM

Thanks, looks good.

clang-tools-extra/clangd/ConfigCompile.cpp
47

that sounds good.

66

oh, I think I understand how the design is driven like this now. that makes sense.

I think it would be helpful to add some documentation (and give a concrete config example, the example you gave is good enough) for CompiledFragmentImpl::Conditions.

clang-tools-extra/clangd/ConfigFragment.h
100 ↗(On Diff #274623)

I think this comment is important, but it is a bit vague (not quite friendly to readers without much config context), we might want to refine it -- would be nice to clearly state what's a condition, e.g. "PathMatch" is a condition, "Platform" is a condition.

maybe separate the rename to a separate patch, but up to you.

108 ↗(On Diff #274623)

nit: maybe explicitly spell this is an OR match.

This revision is now accepted and ready to land.Jun 30 2020, 11:52 PM
sammccall marked 8 inline comments as done.Jul 1 2020, 1:04 AM

Thanks a lot for the review, I think this is a lot better now.
And particularly on where docs are lacking - I'm happy to write comments but it's hard for me to know which bits are least clear.

clang-tools-extra/clangd/ConfigFragment.h
100 ↗(On Diff #274623)

Updated comment to tie the concept of conditions to the contents of the If block, and added an example for the OR matching.

108 ↗(On Diff #274623)

I'd prefer not to do this explicitly in each condition, as it's hard to get across in a couple of words and I think it'd hurt readability overall to repeat it so much.

This revision was automatically updated to reflect the committed changes.
sammccall marked an inline comment as done.