The elements of SearchPath::SearchDirs are being referenced to by their indices. This proved to be error-prone: HeaderSearch::SearchDirToHSEntry was accidentally not being updated in HeaderSearch::AddSearchPath(). This patch fixes that by referencing SearchPath::SearchDirs elements by their address instead, which is stable thanks to the bump-ptr-allocation strategy.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Adjusting the indices seem pretty fragile to me. Any reason why you wanted to stick with indices as keys instead of switching to something else like I suggested here?
I've implemented your suggestion locally like so:
llvm::SpecificBumpPtrAllocator<DirectoryLookup> SearchDirsAlloc; std::vector<DirectoryLookup *> SearchDirs; llvm::DenseMap<const DirectoryLookup *, unsigned> SearchDirToHSEntry; ...
In the end, I decided to stick with just updating the indices to avoid creating unnecessary indirection (and replacing . with -> everywhere).
I don't have any preference though, so if you feel strongly about the fragility, I can use addresses to refer to DirectoryLookup objects instead of indices.
I would prefer to use std::shared_ptr<DirectoryLookup> instead of unsigned. IIUC we already basically have a level of indirection because if we want to get the DirectoryLookup, we need to find it by its index in SearchDirs. And updating the indices just seems like a hack to me that can be solved more cleanly.
Wouldn’t we also need to update the indices in SearchDirsUsage and ModuleToSearchDirIdx? Or replace those by std::shared_ptr<DirectoryLookup> as well?
Yeah, in some cases, we'd be just replacing the index-based indirection with address-based indirection (e.g. when working with SearchDirToHSEntry).
However, most existing HeaderSearch algorithms already work with the index and use that to get the DirectoryLookup from SearchDirs. We'd be adding another level of indirection for them: first they'd need to get DirectoryLookup * from SearchDirs with the index and then dereference the pointer. Maybe that's not very important from performance perspective (especially if we bump-ptr-allocate them), but something to be aware of IMO.
The SearchDirsUsage member is a bit-vector that's being inserted into the same way SearchDirs is, so no need to fiddle with indices there. As for ModuleToSearchDirIdx: yes, we'd need to update indices the same way we do here.
I like this a lot better. Some comments inline.
clang/include/clang/Lex/HeaderSearch.h | ||
---|---|---|
183 | I haven’t tried it but is there a particular reason why this can’t be a const DirectoryLookup *? | |
clang/lib/Lex/HeaderSearch.cpp | ||
327 | Nitpick: SearchDirs[Idx] can be simplified to SearchDir->isFramework(). Similarly below. | |
clang/unittests/Lex/HeaderSearchTest.cpp | ||
276 | Wouldn’t it be cleaner to just check that UsedSearchDirs only contains a single element and that it’s name is /M2? In that case we could also remove getSearchDirUsage. |
clang/include/clang/Lex/HeaderSearch.h | ||
---|---|---|
183 | While iterating over SearchDirs, the elements can be passed to HeaderSearch::loadSubdirectoryModuleMaps that mutates them. | |
clang/lib/Lex/HeaderSearch.cpp | ||
327 | SearchDir will be removed in the following patch: D113676. | |
clang/unittests/Lex/HeaderSearchTest.cpp | ||
276 | Maybe I'm misunderstanding you, but I don't think so. We'd still need accessor for HeaderSearch::UsedSearchDirs and we don't have the expected DirectoryLookup * lying around, making matching more cumbersome: const llvm::DenseSet<DirectoryLookup *> &UsedSearchDirs = Search.getUsedSearchDirs(); EXPECT_EQ(UsedSearchDirs.size(), 2); EXPECT_EQ(1, llvm::count_if(UsedSearchDirs, [](const auto *SearchDir) { return SearchDir->getName() == "/M1"; })); EXPECT_EQ(1, llvm::count_if(UsedSearchDirs, [](const auto *SearchDir) { return SearchDir->getName() == "/M2"; })); or llvm::DenseSet<std::string> UsedSearchDirsStr; for (const auto *SearchDir : Search.getUsedSearchDirs()) UsedSearchDirsStr.insert(SearchDir->getName()); EXPECT_EQ(UsedSearchDirsStr, (llvm::DenseSet<std::string>{"/M1", "/M2"})); I think having bit-vector, whose indices correspond to the directory names ("/M{i}"), and using operator== for matching is simpler. Let me know if you had something else in mind. |
clang/include/clang/Lex/HeaderSearch.h | ||
---|---|---|
183 | OK. Makes sense. Thanks. | |
clang/lib/Lex/HeaderSearch.cpp | ||
327 | Ah, OK. In that case it makes sense to keep using SearchDirs[Idx]. | |
clang/unittests/Lex/HeaderSearchTest.cpp | ||
276 | I just don’t like the bit-vectors and basically thought of the second option you were suggesting, but maybe that’s really just personal taste. If you’d like to keep the bit-vector, could you change the comment of getSearchDirUsage to something like /// Return a vector of length \c SearchDirs.size() that indicates for each search directory whether it was used. |
Thanks for the feedback!
clang/unittests/Lex/HeaderSearchTest.cpp | ||
---|---|---|
276 | I've updated documentation for HeaderSearch::getSearchDirUsage. |
- which is stable thanks to the bump-ptr-allocation strategy. I don't understand this. In each slab, that's true, but why is it true between objects allocated in different slabs?
- This increases numbers of TUs compiled for LexTests by over 10%. Is there no way around that Frontend dep?
That's referring to the fact that once we allocate new DirectoryLookup with SpecificBumpPtrAllocator, address of that object won't change (unlike with std::vector). This means that we can take the address and use it without worrying about invalidation.
- This increases numbers of TUs compiled for LexTests by over 10%. Is there no way around that Frontend dep?
I'll look into moving clang::ApplyHeaderSearchOptions from Frontend into Lex.
That's referring to the fact that once we allocate new DirectoryLookup with SpecificBumpPtrAllocator, address of that object won't change (unlike with std::vector). This means that we can take the address and use it without worrying about invalidation.
Ah, "stable" made me think of iteration order, but that's not what you meant. Makes sense, thanks!
Hi @jansvoboda11 !
I believe this commit is the root cause of this issue: https://github.com/llvm/llvm-project/issues/53161
Believe it or not, I was (un)lucky enough to run into it. I have bisected the repo and my tests indicate that this commit is the offender. Do you have any obvious idea what could be the problem?
I'm suspecting this is the offender, since it's related to #include_next which is where I get the error:
https://github.com/llvm/llvm-project/commit/8503c688d555014b88849e933bf096035a351586#diff-fb96fa41fef79b66ab53ac1d845ab467defc41c2639e9b0cd60b471b04f898cdR981
To summarize, clang cannot find /usr/include only if I include exactly 511 directories via -I. If I include 510 or 512, then it works fine. I will test if the same happens with smaller powers of two. I have a test script that I can share if you want to test it yourself.
EDIT: the same happen when including exactly 255 directories. It does not happen for 127 and below. It does happen for 1023 and above.
Hi, thanks for reporting this. The reproducer script would be helpful. Looking at the code, I don't see anything obviously wrong.
Sure, I attach it do this message. It assumes you run from a place from which you can run ./build/bin/clang.
Let me know if I can provide anything else to help! It might not be this commit in particular, perhaps the problem is in some allocator class merged earlier.
Thank you!
I'm pretty sure it's this commit that causes the failure.
I now see that HeaderSearch::LookupFile callers perform pointer arithmetics on the out-parameter const DirectoryLookup *&CurDir. They rely on the fact that all DirectoryLookup instances are in contiguous memory. That was true when they were stored in std::vector<DirectoryLookup>, but since they are allocated in llvm::SpecificBumpPtrAllocator<DirectoryLookup> after this patch, that assumption is no longer true.
I have reverted the patch for now and will look into fixing this.
I'll need to figure out how to efficiently replace the pointer arithmetic performed by Preprocessor::LookupFile, or go back to my first solution: keeping std::vector<DirectoryLookup> and updating indices in AddSearchPath...
Replace external uses of const DirectoryLookup * by an optional iterator. That's how they get treated by callers.
clang/include/clang/Lex/HeaderSearch.h | ||
---|---|---|
399 | This is the interesting change in the latest revision. Callers of this function performed pointer arithmetics on const DirectoryLookup *, essentially treating it as an iterator. Since DirectoryLookup objects are no longer stored in contiguous memory, that's unsafe. I'm using an iterator to allow doing that safely and using llvm::Optional to provide the nullability clients take advantage of. I don't love the long name that causes a bunch of formatting changes. Maybe it's worth extracting it out of HeaderSearch and giving it shorter name. I also don't love that the API is not the same as for const DirectoryLookup *, leading to some "unnecessary" changes (e.g. CurDir = nullptr -> CurDir.reset()). I think it might be worth creating custom iterator type that would have the original API, but would behave correctly with the new memory setup. |
I haven’t tried it but is there a particular reason why this can’t be a const DirectoryLookup *?