This is an archive of the discontinued LLVM Phabricator instance.

Adding HardLink Support to VirtualFileSystem.
ClosedPublic

Authored by usaxena95 on Aug 28 2018, 7:52 AM.

Details

Summary

Added support of creating a hardlink from one file to another file.
After a hardlink is added between two files, both file will have the same:

  1. UniqueID (inode)
  2. Size
  3. Buffer

This will bring replay of compilation closer to the actual compilation. There are instances where clang checks for the UniqueID of the file/header to be loaded which leads to a different behavior during replay as all files have different UniqueIDs.

Diff Detail

Repository
rL LLVM

Event Timeline

usaxena95 created this revision.Aug 28 2018, 7:52 AM

Thanks for the change. Having something like this makes total sense.

Mutating existing in-memory nodes looks shaky and requires making Status mutable, which also looks undesirable.
Maybe we could consider adding a new kind of InMemoryNode for hard links that would store the link target? We would have to "resolve" the link on each access, but that would probably allow to avoid changing Status interfaces and get rid of the `UniqueID -> set of files' map. WDYT?

Also some generic comments wrt to the coding style, naming, etc.

include/clang/Basic/VirtualFileSystem.h
80 ↗(On Diff #162862)

It seems Status was designed to be immutable.
Can we copy or reassign the whole status at the points where we need it?

359 ↗(On Diff #162862)

Maybe use the common spelling: "hard link"? Otherwise it feels like there is a HardLink class somewhere.

362 ↗(On Diff #162862)

The links seem to only work for files at this point.
Maybe explicitly document what happens with directories?
Also, maybe document what happens with existing hard links and existing files?

362 ↗(On Diff #162862)

Maybe s/addHardlink/addHardLink?

468 ↗(On Diff #162862)

Accidental whitespace change?

lib/Basic/VirtualFileSystem.cpp
516 ↗(On Diff #162862)

s/buffer/Buffer

645 ↗(On Diff #162862)

s/to_node/ToNode

unittests/Basic/VirtualFileSystemTest.cpp
1608 ↗(On Diff #162862)

Accidental whitespace change?

usaxena95 updated this revision to Diff 163128.Aug 29 2018, 9:40 AM
Created InMemoryHardLink Class which can handle the resolved InMemoryFile
usaxena95 marked 8 inline comments as done.Aug 29 2018, 10:01 AM

Applied the suggested changes. Added InMemoryHardLink as a subclass of InMemoryNode. Modified the tests accordingly.

Could we try removing Status from the base class and move it into InMemoryFile and InMemoryDir?
It shouldn't be too much work and would give safer interfaces for our new hierarchy.

lib/Basic/VirtualFileSystem.cpp
470 ↗(On Diff #163128)

NIT: maybe use a name that follow the naming style for the other node kinds, i.e. IME_HardLink?

526 ↗(On Diff #163128)

Can we get away without storing Name? toString() can just say "hard link to " + ResolveNode->toString()

529 ↗(On Diff #163128)

Why do we need the Stat?
We can get it from ResolveNode if we want to store a copy. Any reason why it would be different?

529 ↗(On Diff #163128)

ResolveNode can't be null, right? Maybe use a reference instead?
Also, maybe make it const?

535 ↗(On Diff #163128)

Could we inline getBuffer and getStatus? Having the client code that's explicit about resolving the references seems like a good idea.

639 ↗(On Diff #163128)

Why not just InMemoryNode*? Null would be the empty hardlink

782 ↗(On Diff #163128)

NIT: maybe call the variable Link or HardLink?
Since it's not a file.

unittests/Basic/VirtualFileSystemTest.cpp
1054 ↗(On Diff #163128)

Maybe also add a test that gets into hard links via directory iterators?
To make sure those give the same results as openFileForRead

usaxena95 marked 7 inline comments as done.Aug 30 2018, 7:52 AM

Could we try removing Status from the base class and move it into InMemoryFile and InMemoryDir?

Moved the Stat and getStatus into the subclasses. User of getStatus now calls getStatus of individual subclasses.

lib/Basic/VirtualFileSystem.cpp
526 ↗(On Diff #163128)

Yeah we can do that. It is just for debug purpose only.

529 ↗(On Diff #163128)

Changed it to reference. Can't make it const as the there it is involved in methods that are non const (like getResolvedFile returns a pointer, getBuffer is not const)

535 ↗(On Diff #163128)

Removed these methods. Moved the logic to client code.

usaxena95 updated this revision to Diff 163330.Aug 30 2018, 7:54 AM
usaxena95 marked 3 inline comments as done.
  • Moved Stat to the subclasses of InMemoryNode
ilya-biryukov added inline comments.Aug 30 2018, 8:49 AM
include/clang/Basic/VirtualFileSystem.h
340 ↗(On Diff #163330)
  • Maybe name it differently, e.g. ResolvedHardLink or HardLinkTarget?
  • Maybe document that how this function behaves for non-null vs null values (i.e. creates dirs/files vs links to files)?
359 ↗(On Diff #163330)

Maybe document that it does nothing for directories?
Also, any reason to not support directories at this point? The limitation seem artificial at this point.

359 ↗(On Diff #162862)

There are still instances of HardLink. Maybe update them too?

lib/Basic/VirtualFileSystem.cpp
485 ↗(On Diff #163330)

Maybe store the result of llvm::sys::path::filename() instead of the full name in the first place?

709 ↗(On Diff #163330)

Why do we need to pass a buffer when creating a hard link?
Maybe just assert it's null?

725 ↗(On Diff #163330)

Why not pass the buffer unchanged?

811 ↗(On Diff #163330)

lookupInMemoryNode never returns link nodes, right? Maybe assert the link case is not possible?

846 ↗(On Diff #163330)

NIT: add else?

848 ↗(On Diff #163330)

NIT: add else?

529 ↗(On Diff #163128)

Both methods could be made const, though, right?
It looks like the buffers are always copied when returned from the file system (because the VFS interface returns a unique_ptr<MemoryBuffer>).
getResolvedFile can also return a const node.

unittests/Basic/VirtualFileSystemTest.cpp
701 ↗(On Diff #163330)

Since gtest only reports one level of locations for the failing assertion, debugging test failures would be hard here (we won't see the actual test locations, the errors will always point into ExpectHardLink function.

We probably want to write a matcher and use EXPECT_THAT, so we can write something like

EXPECT_THAT(File, IsHardLinkTo(LinkTarget))

To write a matcher, we could use the gmock macros:

MATCHER_P(IsHardLinkTo, To, "") {
  llvm::StringRef From = arg;
  llvm::StringRef To = arg;
  // return true when matches, false otherwise
}
usaxena95 updated this revision to Diff 163549.Aug 31 2018, 9:26 AM
usaxena95 marked 15 inline comments as done.
  • Made suggested changes.

Made the suggested changes.

include/clang/Basic/VirtualFileSystem.h
359 ↗(On Diff #163330)

Links to directories cannot be added with the current logic. Resolving paths and iterating directories becomes non trivial. Current implementation supports the use case of "file with multiple names" which is mostly sufficient to mimic the compilation.

lib/Basic/VirtualFileSystem.cpp
529 ↗(On Diff #163128)

Since a raw pointer to the resolved file is returned by getResolvedNode (used by for lookup) , I have also changed some other raw pointer type to const pointer tpye.

Thanks, the interface and implementation look good!
Last drop of nits and suggestions for code simplification.

include/clang/Basic/VirtualFileSystem.h
359 ↗(On Diff #163330)

You're right. E.g. one potential trouble is cycles in the directory tree.
Let's leave them out, directory links do seem like some hard work and we don't have a use-case for them.

364 ↗(On Diff #163549)

InMemoryHardLink is a private implementation detail. Maybe remove it from the docs of the public API?

369 ↗(On Diff #163549)

Maybe expand a little on what a "hard link" is in our implementation?
Specifically, the important bits seem to be:

They are not intended to be fully equivalent to hard links of the classical filesystems, despite the name.
Both the original file and a hard link have the same UniqueID returned in their stats, share the same buffer, etc.
There is no way to distinguish hard links and the original files after they were added.
(this is already in the docs) Only links to files are supported, directories cannot be linked.

370 ↗(On Diff #163549)

Maybe remove these defaulted parameters? They're not used and we'll probably have confusing semantics even if we add support for them (since our hard links don't have their own status, and merely return the status of the original file).
WDYT?

lib/Basic/VirtualFileSystem.cpp
476 ↗(On Diff #163549)

Please use std::string, we usually don't leave out the std:: namespace in LLVM.

523 ↗(On Diff #163549)

Maybe return a reference here too? To indicate this can't be null.

653 ↗(On Diff #163549)

HardLinkTarget can only be InMemoryFile, so maybe change the type of the parameter to const InMemoryFile*?
Clients will be responsible for making sure the target is actually a file, but that seems fine, they already have to resolve the path to the target file.

665 ↗(On Diff #163549)

We don't use Status for links, maybe move creation of Status into the branch that handles files?
This would allow to assert Buffer is not null too

690 ↗(On Diff #163549)

Oh, wow, this used to set the size of the created parent directories to the size of the buffer we're creating. I guess nobody looks at it anyway, so we were never hitting any problems because of this.
Good catch, thanks for fixing this.

803 ↗(On Diff #163549)

NIT: maybe invert condition?
I.e. change to

if (!Node)
  return Node.getError();
//.. rest of the code

to keep the amount of code with higher nesting small. See somewhat related section in LLVM Style Guide

804 ↗(On Diff #163549)

This line seems to have wrong indent.
Running git-clang-format before submitting the code is a convenient way to fix this.

809 ↗(On Diff #163549)

NIT: add llvm_unreachable() to make it clear we don't have a 4th kind of node?

841 ↗(On Diff #163549)

Maybe extract it into a small helper function?

It only pops up twice, but a helper should still give better readability for both cases.

unittests/Basic/VirtualFileSystemTest.cpp
702 ↗(On Diff #163549)

s/IsHardLinktTo/IsHardLinkTo

705 ↗(On Diff #163549)

This matcher is pretty complicated and tests multiple things.
If it fails, it might be hard to track down what exactly went wrong.

Given the name, it seems reasonable for it to simply test that both paths point to the same unique-id.
We don't have to test every other invariant (sizes are the same, buffers are the same) on all the links we create, one test assertion per invariant should be enough (sizes are the same, buffers are the same, etc)

usaxena95 marked 16 inline comments as done.Sep 3 2018, 7:11 AM

Applied suggested changes.

lib/Basic/VirtualFileSystem.cpp
653 ↗(On Diff #163549)

Moved InMemoryFile from detail:<anonymous> namespace to detail namespace in order to be visible in VirtualFileSystem.h

unittests/Basic/VirtualFileSystemTest.cpp
705 ↗(On Diff #163549)

Removed the check for Buffer and size from the matcher. Added the test for size and buffer in the smaller AddHardLinkTest

usaxena95 updated this revision to Diff 163713.Sep 3 2018, 7:12 AM
usaxena95 marked 2 inline comments as done.
  • applied suggested changes
usaxena95 updated this revision to Diff 163720.Sep 3 2018, 7:59 AM
  • Moved helper function into anonymous namespace
ilya-biryukov added inline comments.Sep 3 2018, 9:15 AM
lib/Basic/VirtualFileSystem.cpp
478 ↗(On Diff #163720)

NIT: accept a parameter of type llvm::StringRef instead of const std::string&.

495 ↗(On Diff #163720)

NIT: .str() is redundant (in fact, a pessimization) if parameter of InMemoryNode is a StringRef.

522 ↗(On Diff #163720)

NIT: as noted in the other comment, .str() can be removed if parameter type is changed.

570 ↗(On Diff #163720)

NIT: .str() can be removed

662 ↗(On Diff #163720)

The comment looks outdated

663 ↗(On Diff #163720)

Shouldn't we assert this instead? This is not an invalid input, this is actually breaking our program logic, right? I.e. one shouldn't call addFile with HardLinkTarget set and non-null buffer

677 ↗(On Diff #163720)

NIT: .str() seems redundant

804 ↗(On Diff #163720)

NIT: use llvm::cast<InMemoryFile> instead of static_cast.
It's essentially equivalent with added asserts.

unittests/Basic/VirtualFileSystemTest.cpp
974 ↗(On Diff #163720)

Please use StringRefs everywhere. Twine should only be generally only be used as temporary objects, see the specific guidelines in Twine's docs.
Whenever a reference to an existing string is needed, a StringRef is the best choice.
If you need to create new strings (e.g. concatenation, etc.), use std::string.

976 ↗(On Diff #163720)

s/content/Content

987 ↗(On Diff #163720)

s/link0/Link0
.... and other lowerCamelCase vars --> UpperCamelCase

1090 ↗(On Diff #163720)

This seems equivalent to EXPECT_THAT(Nodes, UnorderedElementsAre("/a", "/a/b", "/c", "/c/d"). Maybe simplify?

Note that iterators might give different file separators on windows. You might want to normalize, e.g. with llvm::sys::path::native(..., Style::posix).
And still watch out for windows buildbot failures, in case the root (/) is interpreted differently on Windows, possibly giving different results.

usaxena95 updated this revision to Diff 163739.Sep 3 2018, 10:58 AM
usaxena95 marked 14 inline comments as done.
  • applied changes

Applied the changes suggested.

lib/Basic/VirtualFileSystem.cpp
677 ↗(On Diff #163720)

Here P is a Twine here which cannot be converted to StringRef

ilya-biryukov accepted this revision.Sep 4 2018, 1:53 AM

Thanks! LGTM with a last drop of NITs.
Do you already have commit access or do you want me to submit this change for you?

include/clang/Basic/VirtualFileSystem.h
369 ↗(On Diff #163739)

NIT: since From should not exist, a comment that it shouldn't be a directory is redundant.

lib/Basic/VirtualFileSystem.cpp
677 ↗(On Diff #163720)

You're right. Sorry for the confusion.

485 ↗(On Diff #163739)

NIT: .data() is redundant (in fact, a pessimization, since StringRef ctor will have to compute the length of the string), string can be implicitly converted to StringRef

613 ↗(On Diff #163739)

NIT: else can be removed, same at the next statement. See the style guide

663 ↗(On Diff #163739)

NIT: maybe put comment into an assertion message?
E.g. assert(!(HardLinkTarget && Buffer) && "HardLink cannot have a buffer")

673 ↗(On Diff #163739)

NIT: end a comment with a full stop

unittests/Basic/VirtualFileSystemTest.cpp
708 ↗(On Diff #163739)

This is effectively a double negation, which is a bit hard to read.
Maybe replace !(a || b != c) to !a && b == c to avoid it?

This revision is now accepted and ready to land.Sep 4 2018, 1:53 AM
usaxena95 updated this revision to Diff 163785.Sep 4 2018, 4:42 AM
usaxena95 marked 9 inline comments as done.
  • fixed the style issues

I don't have commit access. Can you please submit this ?

I don't have commit access. Can you please submit this ?

Will do.

This revision was automatically updated to reflect the committed changes.

Submitted. Thanks for the change!