Clean pathnames in FileManager.AbandonedPublicActions

Authored by ppluzhnikov on Mar 15 2022, 1:15 PM.

Details

Reviewers
 dexonsmith benlangmuir bnbarham arphaman vsapsai mstorsjo rnk ilya-biryukov
Summary

This reduces visual noise ("./foo.h" -> "foo.h") in error messages.

Fix all tests which fail as a result.

This also fixes a class of failures when FILE is used in modules, and
the module specifies 'textual header "foo.h"' but Clang has "./foo.h"
included elsewhere.

Finally this should fix Windows failure from https://reviews.llvm.org/D121658

Diff Detail

Event Timeline

ppluzhnikov created this revision.Mar 15 2022, 1:15 PM
Herald added a project: Restricted Project. Mar 15 2022, 1:15 PM
ppluzhnikov requested review of this revision.Mar 15 2022, 1:15 PM
Herald added a project: Restricted Project. Mar 15 2022, 1:15 PM

Fix Win x64 failures.

Herald added a project: Restricted Project. Mar 17 2022, 11:01 AM

Use single quotes in sed -- don't want shell expansion there.

I haven't had time yet to think through the implications of this.

At first glance this seems fine/correct/great. The only case I know of where ./ means something is if you're running an executable, some shells block finding executables in the current path without it. I don't think the FileManager is used for cleaning up relative paths to executables (in the unlikely case such a usecase grows, the caller can surely handle this!).

However, FileManager changes sometimes have odd side effects... and it's possible that somewhere in clang relies on having FileManager::getFileRef() return precisely the same path that was requested. Tagging a few other people that have some context... please share your opinions!

@ppluzhnikov, can you give more context on how this interacts with https://reviews.llvm.org/D121658? I had a quick look but it wasn't immediately obvious.

bnbarham added a comment.EditedMar 18 2022, 11:13 AM

However, FileManager changes sometimes have odd side effects... and it's possible that somewhere in clang relies on having FileManager::getFileRef() return precisely the same path that was requested. Tagging a few other people that have some context... please share your opinions!

Wouldn't that already not be the case though (ie. given RedirectingFileSystem and use-external-names is currently a thing)? I know we're wanting to change that, but I don't *know* of anywhere that depends on this currently.

@ppluzhnikov, can you give more context on how this interacts with https://reviews.llvm.org/D121658? I had a quick look but it wasn't immediately obvious.

If I understand correctly, the failing tests in that patch are failing because they're always expecting "/" and since sys::append is used, it's now "\\" on Windows. The remove dots change doesn't fix those, since the tests would still need to be updated to remove the "./" (which is most of the tests in this patch). There's also some others where I wouldn't expect them to be failing in this patch, eg. the ones from / -> {{[/\\]}}.

Are there tests that we can't just fix to expect either / or \\? Why do we need to change the underlying behaviour here at all?

@ppluzhnikov, can you give more context on how this interacts with https://reviews.llvm.org/D121658? I had a quick look but it wasn't immediately obvious.

D121658 broke Winx64 test due to "./" vs. ".\\" mismatch. This change removes "./", so hopefully after this goes in D121658 will no longer break anything.

There's also some others where I wouldn't expect them to be failing in this patch, eg. the ones from / -> {{[/\\]}}.

These are failing because remove_dots (un-intuitively) also changes "foo/bar\\baz" to "foo\\bar\\baz"

Are there tests that we can't just fix to expect either / or \\?

That's what this change is doing (but there is more Winx64 cleanup pending; sorry, I should have sent this as a draft, but I am still learning the ropes here).

Why do we need to change the underlying behaviour here at all?

Sorry, I didn't understand that comment.

There's also some others where I wouldn't expect them to be failing in this patch, eg. the ones from / -> {{[/\\]}}.

These are failing because remove_dots (un-intuitively) also changes "foo/bar\\baz" to "foo\\bar\\baz"

Are there tests that we can't just fix to expect either / or \\?

That's what this change is doing (but there is more Winx64 cleanup pending; sorry, I should have sent this as a draft, but I am still learning the ropes here).

Ah, I didn't realise that remove_dots also changed slashes.

Why do we need to change the underlying behaviour here at all?

Sorry, I didn't understand that comment.

I was referring to the other patch here. I was under the impression that you started this patch to fix the failures in D121658, but perhaps that isn't the case and you just thought this was a generally nice cleanup.

FWIW, I quite like the clean up effect that this patch has (assuming we are able to convince ourselves that it's safe/correct for FileManager to do this).

• Removing the dots feels like pure goodness.
• Canonicalizing the slashes also seems potentially nice (I had missed that remove_dots() does this), as long as FileManager is respecting any user-specified path style on Windows. Note that the style could be either sys::path::Style::windows_slash or sys::path::Style::windows_backslash, but your remove_dots call will be following sys::path::Style::native. Although, it could also be hard to reason about, if some paths in the output are trafficked through FileManager and get cleaned up, while others are used directly and don't get cleaned up.

Given the Windows slash behaviour, would be good to get a Windows reviewer in, maybe @mstorsjo or @rnk.

That's what this change is doing (but there is more Winx64 cleanup pending; sorry, I should have sent this as a draft, but I am still learning the ropes here).

No worries; sorry for jumping in guns blazing. (Drafts are a new feature, and I've been afraid to touch them myself :).)

BTW, if this still isn't ready for review, you can mark it as "changes planned".

ppluzhnikov planned changes to this revision.EditedMar 18 2022, 2:57 PM

Pending further Win x64 test failure cleanup.

rnk added a comment.Mar 18 2022, 3:57 PM

I've been somewhat afraid to touch this code because of symlinks. Is that something we need to worry about? Consider this path: root/symlink/../foo.h. remove_dots will turn this into root/foo.h, but if symlink points to some unrelated directory, the .. directory entry points to something other than root. I can't imagine that people rely on this behavior often, but I could be wrong.

If the consequences are mostly that diagnostics and __FILE__ in such a case refer to non-existent files, I can live with that, it's just a moderate reduction in usability for what seems like unreasonable usage.

I have also wanted to do slash canonicalization at this exact point, to respect the user setting.

I've been somewhat afraid to touch this code because of symlinks. Is that something we need to worry about? Consider this path: root/symlink/../foo.h. remove_dots will turn this into root/foo.h, but if symlink points to some unrelated directory, the .. directory entry points to something other than root. I can't imagine that people rely on this behavior often, but I could be wrong.

remove_dots doesn't remove .. by default, it needs to be passed true to do that. This is only removing ..

More Debian and Win x64 fixes.

I've been somewhat afraid to touch this code because of symlinks. Is that something we need to worry about? Consider this path: root/symlink/../foo.h. remove_dots will turn this into root/foo.h, but if symlink points to some unrelated directory, the .. directory entry points to something other than root. I can't imagine that people rely on this behavior often, but I could be wrong.

remove_dots doesn't remove .. by default, it needs to be passed true to do that. This is only removing ..

Yeah, I'd be against doing ..-removal unless we had a cheap-enough way to detect if that was symlink-incorrect and skipped it in that case. But maybe it'd be worth a comment here explicitly saying ..s were intentionally not being canonicalized.

I've been somewhat afraid to touch this code because of symlinks. Is that something we need to worry about? Consider this path: root/symlink/../foo.h. remove_dots will turn this into root/foo.h, but if symlink points to some unrelated directory, the .. directory entry points to something other than root. I can't imagine that people rely on this behavior often, but I could be wrong.

remove_dots doesn't remove .. by default, it needs to be passed true to do that. This is only removing ..

Yeah, I'd be against doing ..-removal unless we had a cheap-enough way to detect if that was symlink-incorrect and skipped it in that case. But maybe it'd be worth a comment here explicitly saying ..s were

A couple more thoughts:

• Probably we should change getDirectoryRef at the same time.
• Mabye this cleanup could mostly contained to there (rather than having it in both places).
• Rely on getDirectoryRef(parent_path(Filename)) to clean up the directory.
• Construct "clean(er)" filename from Dir->getName() and sys::path::filename(Filename)).
• Call remove_leading_dotslash() in case the dir is ".".
• ... then go down and stat the file.
• I was a bit worried about the hacks for the VFS with "use-external-names=true" interfering with my suggestion above, but I looked and I think it's okay.
• Hacks: see logic for when Status.getName() != Filename: Filename gets replaced with what Status returned.
• Seems like getDirectoryRef() doesn't have this hack so I don't think it'll interfere here.
• BTW, @bnbarham and I talked through a way to remove this hack (to model exposing the external name without replacing the access name); hopefully it won't stick around too much longer!
• It's probably not very hard/expensive to do ..-removal correctly (correctly in all cases, by not removing ..s unless stat proves it's safe).
• Call getDirectoryRef() as above.
• If there are any internal "..", try removing them and call getDirectoryRef() again. If they're successful and have the same inode, use the cleaner name.
• The extra stat happens only once per directory that has a .. component. Maybe not too bad?
• ... again, probably better to do this internally to getDirectoryRef()
• ... certainly seems better to land this as a second incremental step, since there's more potential for trouble.
rnk added a comment.Mar 23 2022, 3:40 PM

Ignoring the ".." path components for the moment, is this patch good to go as is? It doesn't affect that behavior.

Ignoring the ".." path components for the moment, is this patch good to go as is? It doesn't affect that behavior.

The failure log here doesn't show why that test is failing, but I reproduced the failure on my home Windows 10 machine, and I am still digging into the failure reason.

One thing I noticed is this diff from running clang-tidy:

< Suppressed 15 warnings (8 in non-user code, 7 with check filters).
> Suppressed 38 warnings (31 in non-user code, 7 with check filters).

It appears that additional files are treated as "non-user code" after the patch.

However I don't think that's the root cause of the failure: when I added --header-filter='.*' I got:

> Suppressed 7 warnings (7 with check filters).

but the test still failed.

Ignoring the ".." path components for the moment, is this patch good to go as is? It doesn't affect that behavior.

Besides the test failure, the other thing is considering what to do with getDirectoryRef(), which might make sense to update in the same patch to avoid inconsistency. See 1st/2nd bullets of https://reviews.llvm.org/D121733#3393732.

Fix more UNIX and Winx64 failures.

Fix one more Winx64 failure.

Fix FIXME in CanonicalIncludesTests.cpp (yet another Winx64 failure).

Fix clangd test failure.

ilya-biryukov requested changes to this revision.May 11 2022, 10:37 AM

Please allow some time for the clangd team to review this before landing.
Changes to filenames used to cause unintended consequences for us before. We switched to using file UIDs since then, but we're still not sure it's not going to break us.

Also see other comments, there are a few important issues.

clang-tools-extra/clangd/index/CanonicalIncludes.cpp
24

Don't specify sizes of arrays explicitly. This is error prone.

546

Why do we change the order of elements here?
Please revert, this increases the diff and is not relevant to the actual change.

670

This line introduces a memory leak.
Notice how the previous version had a static variable.

677

This function is on a critical path. We don't want to pay for Canonicalize on every call to it.
Please create a static variable and initialize in a lambda if that's absolutely necessary.

static auto *Mapping = []{ StringMap *Mapping = new ...; /* init code*/ return Mapping; }();
This revision now requires changes to proceed.May 11 2022, 10:37 AM
clang/lib/Basic/FileManager.cpp
218

this is actually breaking the contract of FileEntryRef, is this the point?

/// A reference to a \c FileEntry that includes the name of the file as it was
/// accessed by the FileManager's client.

I don't see mention of this contract being changed explicitly anywhere, if so can you mention it in the commit message and also update the documentation? (the same applies to DirectoryEntryRef changes as well)

I am not able to take a detailed look at this at the moment, but this doesn't feel like a safe change for all clients. Because people might be relying on this contract without explicitly testing the behaviour for "./" in the filenames. So tests passing (especially when you're just updating them to not have ./) might not be implying it's safe.

Please note that this patch still breaks Winx64 tests,
and that I marked it as "further changes required" / not ready for review
some time ago.

ppluzhnikov planned changes to this revision.May 11 2022, 11:16 AM
clang-tools-extra/clangd/index/CanonicalIncludes.cpp
24

std::array requires size.

I could use std::vector instead, at the cost of an extra allocation.

546

Note that the elements are inserted into a map
(after commit b3a991df3cd6a; used to be a vector before).

Also note that there are duplicates, e.g.

{"bits/typesizes.h", "<cstdio>"},
{"bits/typesizes.h", "<sys/types.h>"},

which can't work as intended / is already broken.

Sorting helps to find these duplicates.

670

No, it does not. This function is called only once to initialize a static variable:

static const auto *SystemHeaderMap = GetHeaderMapping();
677

This function is only called once.

clang/lib/Basic/FileManager.cpp
218

I chased this comment down to commit 4dc5573acc0d2e7c59d8bac2543eb25cb4b32984.

The commit message says:

This commit introduces a parallel API to FileManager's getFile: getFileEntryRef, which returns
a reference to the FileEntry, and the name that was used to access the file. In the case of
a VFS with 'use-external-names', the FileEntyRef contains the external name of the file,
not the filename that was used to access it.

So it appears that the comment itself is not quite correct.

It's also a bit ambiguous (I think) -- "name used to access the file"
could be interpreted as the name which clang itself used to access
the file, and not necessarily the name client used.

people might be relying on this contract without explicitly testing the behaviour for "./" in the filenames.

That's a possibility.

I am not sure how to evaluate the benefit of this patch (avoiding noise everywhere) vs. possibly breaking clients.

clang-tools-extra/clangd/index/CanonicalIncludes.cpp
24

Use raw arrays.

546

This refactoring makes sense, but please split this into a separate change.

670

There is no guarantee someone won't run this again by mistake in the future revisions.
Make this function leak-free, don't rely on global invariants.

It's easy to do with a lambda trick from the next comment.

clang/lib/Basic/FileManager.cpp
218

While the comment is not currently accurate due to use-external-names, the goal is to eliminate that behaviour and have a separate API for getting the external names. It's maybe still fine to eliminate ./ here, I don't have a strong opinion.

It's also a bit ambiguous (I think) -- "name used to access the file"

could be interpreted as the name which clang itself used to access
the file, and not necessarily the name client used.

It means the name passed to getFileRef.

clang-tools-extra/clangd/index/CanonicalIncludes.cpp
546
ppluzhnikov abandoned this revision.Mon, Jun 6, 8:27 AM

This proved to be too hard :-(
A smaller change: https://reviews.llvm.org/D126396 fixed one aspect of this problem.