This is an archive of the discontinued LLVM Phabricator instance.

[FileManager] Do not call 'real_path' in getFile().
ClosedPublic

Authored by ioeric on Aug 23 2018, 6:23 AM.

Details

Summary

This partially rolls back the change in D48903:
https://github.com/llvm-mirror/clang/commit/89aa7f45a1f728144935289d4ce69d8522999de0#diff-0025af005307891b5429b6a834823d5eR318

real_path can be very expensive on real file systems, and calling it on each
opened file can slow down the compilation. This also slows down deserialized
ASTs for which real paths need to be recalculated for each input files again.

For clangd code completion latency (using preamble):
Before


After

Diff Detail

Event Timeline

ioeric created this revision.Aug 23 2018, 6:23 AM
ioeric updated this revision to Diff 162166.Aug 23 2018, 6:23 AM
  • Add comment
ioeric edited the summary of this revision. (Show Details)Aug 23 2018, 7:11 AM
simark added inline comments.Aug 23 2018, 10:35 AM
lib/Basic/FileManager.cpp
319

What's the rationale for only computing the field if UFE.File is non-null?

Previously, if you looked up the file with openFile == false and then later openFile == true, the RealPathName field would not be set because of this. That doesn't seem right.

326

If the path contains symlinks, doesn't this put a non-real path in the RealPathName field? Won't users (e.g. clangd) use this value thinking it is a real path, when it is actually not?

ioeric added inline comments.Aug 23 2018, 10:51 AM
lib/Basic/FileManager.cpp
319

There has been no guarantee that RealFilePath is always set. I think that's the reason why the acceasor is called tryGetRealPathName.

326

This was the original behavior. In general, File Manager should never call real_path for users because it can be very expensive. Users should call real_path if they want to resolve symlinks. That said, it's fair to say that "RealPathName" is just a wrong name, and we should clean it up at some point.

simark added inline comments.Aug 23 2018, 11:09 AM
lib/Basic/FileManager.cpp
319

The way I understood it was that it could be empty because computing the real path can fail. Not just because we didn't skipped computing it.

326

Ok, then if the goal is not to actually have a real path (in the realpath(3) sense), that's fine. But I think we should rename the field sooner than later, it's really confusing.

That also means that it's kind of useless for us in clangd, so we should always call real_path there and not rely on that field.

ioeric added inline comments.Aug 23 2018, 12:22 PM
lib/Basic/FileManager.cpp
319

I agree that the API is confusing and lack of documentation (and we should fix), but the previous implementation did skip the computation if File is not available though.

326

Ok, then if the goal is not to actually have a real path (in the realpath(3) sense), that's fine. But I think we should rename the field sooner than later, it's really confusing.

+1

That also means that it's kind of useless for us in clangd, so we should always call real_path there and not rely on that field.

I guess it depends on whether you want symlink resolution or not. I know that clangd's index symbol collector resolves symlink explicitly, but I don't think clangd enforces symlink resolution in general?

simark added inline comments.Aug 23 2018, 12:33 PM
lib/Basic/FileManager.cpp
319

I agree that the API is confusing and lack of documentation (and we should fix), but the previous implementation did skip the computation if File is not available though.

Did it have a reason not to? What is the RealPathName field useful for, if it's unreliable?

326

I guess it depends on whether you want symlink resolution or not. I know that clangd's index symbol collector resolves symlink explicitly, but I don't think clangd enforces symlink resolution in general?

If we don't, we probably risk having duplicate results similar to what

https://reviews.llvm.org/D48687

fixed, by with paths differing because of symlinks instead of dot-dots.

ioeric added inline comments.Aug 23 2018, 12:53 PM
lib/Basic/FileManager.cpp
319

I think the biggest concern is the performance.

For example, clangd code completion latency increased dramatically with real_path:
With real_path:


Wihtou real_path:

326

Was the bug addressed in D48687 actually caused by symlinks though? If want we want is absolute paths with dotdot cleaned, it should be much cheaper to call VFS::makeAbsolutePath with remove_dot_dot.

In general, it's unclear whether clangd should resolve symlink (it might not always be what users want), and it should probably be a decision made by the build system integration. I think we would need a much more careful design if we decide to handle symlinks in clangd.

simark added inline comments.Aug 23 2018, 1:13 PM
lib/Basic/FileManager.cpp
319

But with this patch, we are not using real_path anymore, so it can't be the reason for not computing RealPathName when not opening the file. Since the non-real_path method is much more lightweight, would it make a significant difference if we always computed the field?

In the end I don't really mind, I am just looking for a rational explanation to why it's done this way.

326

Was the bug addressed in D48687 actually caused by symlinks though?

No, it was caused by non-relative path that included dotdots, I was extrapolating.

If want we want is absolute paths with dotdot cleaned, it should be much cheaper to call VFS::makeAbsolutePath with remove_dot_dot.

Agreed.

In general, it's unclear whether clangd should resolve symlink (it might not always be what users want), and it should probably be a decision made by the build system integration. I think we would need a much more careful design if we decide to handle symlinks in clangd.

Indeed, if those paths end up displayed in the UI, it could be preferrable to not resolve the symlinks. Using the real_path was kind of the "atomic bomb" solution to get rid of duplicates.

ioeric added inline comments.Aug 23 2018, 1:24 PM
lib/Basic/FileManager.cpp
319

Ohh, sorry that I was confused with the other thread...

Honestly, I also don't know where all these came from (it was implemented many years ago...). For now, I'm just trying to fix the problem with the safest change, as the old behavior, although confusing, should be pretty stable. Changing behavior further would likely cause more problems. I would prefer making further change when we are actually cleaning up or redesigning RealPathName/tryGetRealPath.

ioeric updated this revision to Diff 162337.Aug 24 2018, 1:20 AM
  • Compute absolute paths even when file is not opened.
lib/Basic/FileManager.cpp
319

Ok, I gave this a try, and it seems that computing absolute paths for unopened files doesn't cause any problem. Thanks!

ilya-biryukov accepted this revision.Aug 24 2018, 1:26 AM

LGTM from my side. We need this to unbreak performance in code completion

This revision is now accepted and ready to land.Aug 24 2018, 1:26 AM
This revision was automatically updated to reflect the committed changes.