This takes ~5% of time when running clangd unit tests.
To achieve this, move mapping of system includes out of CanonicalIncludes
and into a separate class
Differential D67172
[clangd] Use pre-populated mappings for standard symbols ilya-biryukov on Sep 4 2019, 7:20 AM. Authored by
Details This takes ~5% of time when running clangd unit tests. To achieve this, move mapping of system includes out of CanonicalIncludes
Diff Detail
Event TimelineComment Actions Also considered moving the std symbol mapping into a separate (private-to-implementation) class, e.g. we don't use suffix mappings and symbol mappings outside system includes. Comment Actions It took me a while to understand this patch. I'd suggest doing it in this patch as well (it should not require large effort to do it, I think?) And I'm not sure whether it is possible to remove the suffix mapping completely -- for STL symbols it is possible, but for other symbols like GNU C headers, I didn't find out an automatic way to build a lookup table. Comment Actions Ok, agreed. Adding this indirection layer is probably making things a bit too complex. I'll update the patch to clearly separate the std symbol mapping and canonical includes. It's because we don't have anything similar to cppreference.com for them? Maybe we could try scrapping man pages ? Comment Actions Not sure whether the new code is a win overall, it's still a bit messy
Comment Actions yes, cppreferences only provides symbols from C++/C standard libraries. Parsing man page is possible, but it requires some work (we have to write another tool to parse it).
Comment Actions I had the same reaction as Haojian to the original patch, thanks for clearing that up. I do wonder whether we're microoptimizing for the tests too much, I don't think 5% on the tests is worth very much in itself, unless it's speeding up real workloads or improving the code (it may well be). I'm nervous about requiring langopts, it's convenient in tests/prototypes/etc to be able to default-construct CanonicalIncludes (e.g. a test that doesn't actually care about standard library mappings). Comment Actions Even though tests don't parse real C++ programs, I wouldn't call it a micro-optimization. Spending 5% on such a low-level operation is not ok, small things like that can add up and have a subtle effect on performance.
We can have a factory method that constructs with default lang opts for tests, I don't think this would be an issue in the actual application code. In fact, I think it'll make the application code better. Comment Actions Only if it's 5% of something meaningful. If the denominator isn't something we care about, then it's really "spending XXX usec is not ok" - which depends on XXX I think.
You'll note that I didn't say that *anything* about the current version was problematic :-) One way this could be simpler is if suffix mappings were gone, then the StdIncludes class would go away and we'd just be left with a stringmap<string>. Then there'd be a performance win with almost no extra complexity. Some of the new complexity seems unneccesary even with this approach. StdIncludes *getStdIncludes(LangOpts) { if (is c++) { static pair<...> entries[] = ...; static StdIncludes *CPP = new StdIncludes(entries); // the things that apply everywhere are in the StdIncludes constructor return CPP; } else if (is c) { ... } else { static StdIncludes *Others = new StdIncludes({}); return Others; } }
I'm not saying that there's no way to work around it, or that it's fatal, but my opinion is that this would be worse than the current state.
Comment Actions Agree, but unit-test time is meaningful clangd developers. Not saying this is more important than real use-cases, but keeping the unit tests enables us to iterate faster.
We seem to know how to eliminate this in principle. See @hokein's comment, we could try parsing man pages or even the headers themselves and building a corresponding symbol map.
Will update the patch. Although I find the current approach more direct, I will happily change the code according to your suggestion.
If we care about having a default constructor, we can default to using StdIncludes with a default mapping. I'll avoid requiring LangOptions in the constructor in that case. Thanks for the early feedback! Comment Actions The new version looks very much like the older one, with a few additions to not re-initialize data twice.
Comment Actions LG, though if we can drop the struct and make MaxSuffixComponents a constant it'd be simpler still. Sure. This is going to win a couple of percent I guess: for these cases we care about walltime including rebuild and lit startup overhead. It's worth having (and thanks for doing this!) but I don't think we should pay much complexity.
"we could try" sounds like we *don't* know how to eliminate it. Parsing manpages aside, I thought the main problem was these symbols are nonstandard and an infinitely portable qname->header mapping simply didn't exist.
This patch adds code shoveling around suffix maps, and so increases the cost of this feature. (Less so after latest changes). It also increases the scope of a future patch that would remove suffix maps. Sequencing this the other way would be better, if removing suffix mapping was straightforward. (I don't think it is)
Thanks, this looks a lot better!
Comment Actions Done.
Agreed.
I would expect qualified names to be more portable than paths, but might be mistaken. I recall that we might run into problems if folks define some names as macros instead of functions, but I would be interested to see how common is this.
I think the simplest option is to always make the path suffix mapping available in the class. Comment Actions In practice, I expect almost all of these names to be C and thus unqualified. Names are indeed more portable but unlike path mappings we can't just include both when different platforms want different things, or nothing.
From a public API standpoint, it seems cleaner to only have these available once addSystemHeadersMapping has been called (with any argument), and delay assigning the pointer until then. Comment Actions
|
nit: maybe StdSuffixHeaderMapping to clarify the purpose, now these are std-only