Details
- Reviewers
sammccall ioeric - Commits
- rG06553bfe9694: Introduce shard storage to auto-index.
rG3e5a47560ce8: Introduce shard storage to auto-index.
rL347038: Introduce shard storage to auto-index.
rCTE347038: Introduce shard storage to auto-index.
rCTE346938: Introduce shard storage to auto-index.
rL346938: Introduce shard storage to auto-index.
Diff Detail
- Repository
- rCTE Clang Tools Extra
- Build Status
Buildable 25043 Build 25042: arc lint + arc unit
Event Timeline
clangd/index/Background.cpp | ||
---|---|---|
32 | Please run Clang-format. Headers in sections should be in alphabetical order. |
clangd/index/Background.h | ||
---|---|---|
31 | nit: prefer the other way around: doc on the interface, the disk implementation can defer to the interface when there's nothing special to say. | |
39 | Why not use the constructor? what does "directory" mean in the general case? |
clangd/index/Background.cpp | ||
---|---|---|
256 | nit: i'd suggest doing the writes *after* updating the index, as the latter is user-facing | |
329 | I don't think this is the right place to be reading shards, vs on startup. It means we're doing extra IO whenever a file is reindexed, and more importantly we don't make the index contents available on startup (if any file needs reindexing, reads may get stuck behind it). (as discussed, we can also leave reading out of this patch entirely) | |
clangd/index/Background.h | ||
35 | nit: we probably don't need "shard" both in the interface name and the method names. Consider either BackgroundIndexStorage/storeShard or ShardStorage/store | |
38 | this signature looks surprising to me, expected something like `Expected<pair<IndexFileIn, FileDigest>> retrieveShard(ShardID). e.g. if we have a stale shard for a file when initializing, maybe we want to use it until we can reindex the file? Current interface doesn't allow this. As discussed offline, it'd also be fine to leave retrieve out of the initial patch. | |
62 | So we have several possible designs on a collision course:
What are we aiming for here? The design of ShardStorage in this patch precludes 1, taking unique_ptr here precludes 2. 3 requires us to revisit some aspects of BackgroundIndex (which is possible). With 4 it isn't obvious how we do multi-config, or how users can manage the storage. (I'd been assuming 1 or 2, but I don't think we ever discussed it) | |
123 | This class can be hidden in the cpp file. |
clangd/index/Background.cpp | ||
---|---|---|
256 | but updating index consumes slabs. Is it worth making a copy? | |
329 | What exactly you mean by startup exactly from BackgroundIndex's perspective. Is it enqueueAll ? | |
clangd/index/Background.h | ||
38 | I thought it would be better to get rid of disk io if the file was out-of-date, but using the stale file and deferring indexing also seems like a good idea. Moving with it. | |
39 | Directory refers to the one specified in CompilationDatabase(which is usually the build directory?), sorry for the inconvenience. | |
62 | The aim was more like 2. But I forgot the fact that we could be working with multiple CDBs :D So it might be better to just pass a factory to BackgroundIndex and let it create one ShardStorage for each CDB(which is distinguished by the build directory?) WDYT? | |
123 | But it needs to be passed into the constructor of BackgroundIndex, possibly by the ClangdServer, do you suggest hiding inside ClangdServer? |
clangd/index/Background.cpp | ||
---|---|---|
256 | Oh, true - probably not. Add a comment? | |
329 | Yes. (Or rather, a task scheduled there) | |
clangd/index/Background.h | ||
39 | tooling::CompileCommand::WorkingDirectory? That doesn't seem especially relevant here. Yes, this seems to be only relevant to DiskBackedIndexStorage | |
62 | Yep, that works for me. | |
123 | We can expose a factory function (e.g. static on ShardStorage) |
clangd/index/Background.h | ||
---|---|---|
39 | I suppose you meant tooling::CompileCommand::Directory rather than WorkingDirectory ? That is the one I was talking about, why do you think it is irrelevant ? |
- Address comments.
- Move loading from cache out of indexing.
- Use a factory for creation of index storage.
clangd/index/Background.h | ||
---|---|---|
39 | Changed logic to use Directory passed to enqueueAll or enqueue, which I assume is the directory that the compilation database lives. And I assumed we will have at most one compilation database per directory. |
clangd/index/Background.cpp | ||
---|---|---|
53 | nit: these micro-optimizations with SmallString seem unlikely to be worthwhile - maybe just take StringRef and return string? | |
110 | absoluteFilePath? (otherwise it's not clear what the path is to) | |
148 | this absolutely can't be done synchronously (it's much slower than calling getAllCompileCommands - this could block clangd for tens of seconds when opening the first file). Can we drop shard-loading in BackgroundIndex from this patch? | |
162 | you can capture indexstorage by value | |
clangd/index/Background.h | ||
34 | I don't think we should be using a static mutable factory here. I think I worded the previous comment about exposing DiskBackedIndexStorage badly, what I meant was something like: static unique_ptr<BackgroundIndexStorage> BackgroundIndexStorage::createDiskStorage(StringRef dir) This would be separate from how the storage strategy is configured on the background index. | |
34 | do we need shared_ptr here? | |
38 | unused? | |
42 | Prefer llvm::Error over bool, or no return value if you don't think we ever want to handle it. | |
45 | doc stale | |
48 | nit: I think "loadShard" would better emphasize the relationship with storeShard, as load/store is such a common pair. (read/write would similarly work). |
clangd/index/Background.cpp | ||
---|---|---|
53 | It wasn't for optimization purposes actually, sys::path::append only accepts a SmallString and there is no direct conversion between SmallString and std::string(or stringref). |
Thanks, mostly interface tweaks now.
clangd/index/Background.cpp | ||
---|---|---|
133 | I think it would be simpler to just disallow the storage factory from returning null, and instead have it return a no-op implementation that logs. Again, this moves the complexity behind an abstraction, where the rest of the code is simpler. | |
clangd/index/Background.h | ||
36 | I like that the implementations treat the shard scheme as opaque, but it'd be good for humans to know that this is going to be a filename in practice. I'd suggest either changing the param name or adding to the interface comment: "Shards of the index are stored and retrieved independently, keyed by shard identifier - in practice this is a source file name" | |
39 | docs | |
39 | Hmm, we're going to attempt to load the shard corresponding to every file that we have in the CDB, even if the index isn't built yet. So "file doesn't exist" is an expected case (where we don't want to log etc), vs e.g. "file was corrupted" is unexpected and should definitely be logged. How do we distinguish these with this API? One defensible option would be to have implementations handle errors: return Optional<IndexFileIn>, and have the disk version log errors appropriately and return None. Another would be Expected<Optional<IndexFileIn>> or so, which is a little messy. | |
43 | docs | |
55 | nit: I'd move the description of IndexStorageFactory semantics to above the Using declaration - generally splitting docs to the most specific decl should make them easier to match up to the code. | |
56 | by the directory they are found in? | |
106 | I do think having the factory own and cache the storages would be simpler:
| |
106 | don't accesses to this map need to be locked? |
- Address comments.
clangd/index/Background.h | ||
---|---|---|
39 | used the former proposition, but with std::unique_ptr instead of llvm::Optional since IndexFileIn is move-only. |
just some nits. Main thing is the LoggingStorage - again I think this may have been a misunderstanding.
clangd/index/Background.cpp | ||
---|---|---|
76 | ADD_FAILURE() I think | |
94 | nit: move constructor to the top? | |
98 | create_directory returns success if the directory exists, so no need for this check | |
108 | Hmm, I'm not really sure what this is for - does someone constructing BackgroundIndex ever not want to define storage *and* not want to control logging? | |
265 | this doesn't give enough context - the logger is global to clangd. maybe "Failed to write background-index shard for file {0}: {1}" | |
clangd/index/Background.h | ||
39 | This is fine, though Optional works fine with move-only types. (your comment still says None) | |
48 | we may also want to provide the standard factory - can go in a later patch though. | |
unittests/clangd/BackgroundIndexTests.cpp | ||
147 | nit: seems a little less unusual to avoid the static: MemoryShardStorage Storage; auto MemoryStorageFactory = [&](llvm::StringRef){ return &Storage; } ... EXPECT_THAT(Storage.contains("root/A.h")); |
I fear we still need to expose createDiskStorage, because someone still needs to tell factory to which creator function to use.
Thanks! Looks good, rest of the nits are obvious or up to you.(
clangd/index/Background.cpp | ||
---|---|---|
37 | nit: I think the moves of the helper functions (digest/digestFile/getAbsoluteFilePath) can be reverted now | |
clangd/index/Background.h | ||
45 | Some docs? Maybe add a comment like // The factory provides storage for each CDB. // It keeps ownership of the storage instances, and should manage caching itself. // Factory must be threadsafe and never returns nullptr. | |
50 | nit: create rather than get would be clearer that this returns an independent object each time. | |
103 | (nit: this comment seems redundant with the type/name) | |
clangd/index/BackgroundIndexStorage.cpp | ||
30 | nit: can we put a separator like between the path and digest? e.g. Foo.cpp.12345.idx vs Foo.cpp_123456.idx vs `Foo.cpp123456.idx' | |
51 | nit: just "for index storage"? | |
79 | You're ignoring write errors here (and in fact errors may not have occurred yet due to buffering). OS.close(); return OS.error(); | |
unittests/clangd/BackgroundIndexTests.cpp | ||
100 | Consider pulling this class to the top level, and use it any time you don't care about the storage (to avoid the need for DummyStorage) | |
111 | I think this can just be (need to include ScopedPrinters.h) | |
121 | this one shouldn't be a failure I think - just return nullptr? (If the test cares about it, the test should depend on it more explicitly). The corrupt index file is a more obvious "should never happen" case. | |
130 | You never actually assert on this value. (I'm not sure you need to, but if not just remove it) | |
146 | seems clearer just to inline this in the constructor(, but up to you | |
162 | (even if the StringMap was encapsulated in the MemoryStorage, you can still write this assertion through the public interface: ASSERT_NE(Storage.loadShard("root/A.h"), None) (you could even assert the actual stored symbols, but may not be worth it) |
nit: prefer the other way around: doc on the interface, the disk implementation can defer to the interface when there's nothing special to say.
That way callers don't have to be confused about what the semantics are in the general case.