As @kadircet mentions in D84912#2184144, findNearbyIdentifier() traverses the whole file if there is no identifier for the word.
This patch ensures give up after 2^N lines in any case.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I feel like I'm doing something totally wrong here :)
Could someone give me an advice?
Hey! Sorry for the late reply, this has been open in my tabs since day 1 just didn't get a chance to take a look at it.
The biggest problem I see is, this is not changing the fact that we are still traversing the whole file:
- You do one traversal over the whole file to find FileLines
- Then possibly two more to find OffsetMin and OffsetMax
You can get rid of the first one by just using getLocForEndOfFile if positionToOffset fails.
For the latter, you can use SourceManager::translateLineCol instead, it uses a cache and cheaper than the linear scan performed by positionToOffset in case of multiple calls. The cache is already populated once we issue the first Cost call, through SM.getSpellingLineNumber
I was reluctant overall as the wins aren't clear. We are still going to traverse the whole file at least once, and readability is hindered but I suppose with the above mentioned two trics, runtime shouldn't be regressed.
Also it is nice to get rid of 2 log calls for each token. Again all of these feels like some micro optimizations that aren't justified.
clang-tools-extra/clangd/XRefs.cpp | ||
---|---|---|
564–565 | Since costs are only compared, we can simplify this: return (Line >= WordLine) ? (Line - WordLine) : 2 * (WordLine - Line) | |
564–565 | this has changed the relative ordering, if we're dropping the log then +1 should now become multiplication by two. | |
568–583 | The initial value of BestCost was chosen so that no-result would be worse than any cost in the eligible range, but better than any cost outside it. (FWIW, I guess sizeof(unsigned) * 8 - 1 -> std::numeric_limits<unsigned>::digits - 1 would be more obvious) | |
570 | I think simplest is to use SourceMgr's line table. int MinLine = (signed)Line - Word.Text.size()/2 , MaxLine = Line + Word.Text.size(); SourceLocation LocMin = SM.translateLineCol(File, std::max(1, MinLine), 1); SourceLocation LocMax = SM.translateLineCol(File, MaxLine); // past-end ok | |
580 | why can't this happen anymore? |
Seems you are right, but we do not compare strings during traversal to find FileLines.
You can get rid of the first one by just using getLocForEndOfFile if positionToOffset fails.
For the latter, you can use SourceManager::translateLineCol instead, it uses a cache and cheaper than the linear scan performed by positionToOffset in case of multiple calls. The cache is already populated once we issue the first Cost call, through SM.getSpellingLineNumberI was reluctant overall as the wins aren't clear. We are still going to traverse the whole file at least once, and readability is hindered but I suppose with the above mentioned two trics, runtime shouldn't be regressed.
Also it is nice to get rid of 2 log calls for each token. Again all of these feels like some micro optimizations that aren't justified.
Thanks for your reply, I will rethink this patch.
clang-tools-extra/clangd/XRefs.cpp | ||
---|---|---|
564–565 | Yes, you are right, my fault here. But should we penalize backwards so hard? | |
580 | As I understand, we call findNearbyIdentifier() only when the word is not an identifier. And Tok is always identifier here. Btw, even if findNearbyIdentifier() could be called for an identifier, I could not get why it's bad to return current position here. Am I wrong? |
FWIW I think if we drop most of the math in favor of using SourceManager's line translation, this doesn't add much complexity and is probably a bit more direct and efficient.
clang-tools-extra/clangd/XRefs.cpp | ||
---|---|---|
580 |
I'm not sure where you're getting that. // Don't use heuristics if this is a real identifier. // Unlikely identifiers are OK if they were used as identifiers nearby. if (Word.ExpandedToken) return nullptr; then "real identifier" here means it's an identifier after preprocessing. (I'm surprised we don't have a test covering this though!)
The point of this function is that *this* occurrence of the word can't be resolved, so let's try another one nearby. |
clang-tools-extra/clangd/XRefs.cpp | ||
---|---|---|
580 | Thanks for your clarification. I will revert this change (and will try to add a test as well). |
clang-tools-extra/clangd/XRefs.cpp | ||
---|---|---|
580 | I failed to create a test for this. if (!(tokenSpelledAt(Tok.location(), TB) || TB.expansionStartingAt(&Tok))) return false; | |
589 | here |
Fix possible UB at bitwise shift.
WordGain => MaxDistance.
Fix LineMin and LineMax values.
Fix comment.
Don't know why this didn't close automatically
Commit: https://reviews.llvm.org/rGd8ba6b4ab3eceb6bbcdf4371d4ffaab9d1a5cebe
Since costs are only compared, we can simplify this:
return (Line >= WordLine) ? (Line - WordLine) : 2 * (WordLine - Line)