This is an archive of the discontinued LLVM Phabricator instance.

Adding HardLink Support to VirtualFileSystem.
ClosedPublic

Authored by usaxena95 on Aug 28 2018, 7:52 AM.

Details

Summary

Added support of creating a hardlink from one file to another file.
After a hardlink is added between two files, both file will have the same:

  1. UniqueID (inode)
  2. Size
  3. Buffer

This will bring replay of compilation closer to the actual compilation. There are instances where clang checks for the UniqueID of the file/header to be loaded which leads to a different behavior during replay as all files have different UniqueIDs.

Diff Detail

Event Timeline

usaxena95 created this revision.Aug 28 2018, 7:52 AM

Thanks for the change. Having something like this makes total sense.

Mutating existing in-memory nodes looks shaky and requires making Status mutable, which also looks undesirable.
Maybe we could consider adding a new kind of InMemoryNode for hard links that would store the link target? We would have to "resolve" the link on each access, but that would probably allow to avoid changing Status interfaces and get rid of the `UniqueID -> set of files' map. WDYT?

Also some generic comments wrt to the coding style, naming, etc.

include/clang/Basic/VirtualFileSystem.h
78

It seems Status was designed to be immutable.
Can we copy or reassign the whole status at the points where we need it?

358

Maybe use the common spelling: "hard link"? Otherwise it feels like there is a HardLink class somewhere.

361

The links seem to only work for files at this point.
Maybe explicitly document what happens with directories?
Also, maybe document what happens with existing hard links and existing files?

361

Maybe s/addHardlink/addHardLink?

473

Accidental whitespace change?

lib/Basic/VirtualFileSystem.cpp
502

s/buffer/Buffer

656

s/to_node/ToNode

unittests/Basic/VirtualFileSystemTest.cpp
1640

Accidental whitespace change?

usaxena95 updated this revision to Diff 163128.Aug 29 2018, 9:40 AM
Created InMemoryHardLink Class which can handle the resolved InMemoryFile
usaxena95 marked 8 inline comments as done.Aug 29 2018, 10:01 AM

Applied the suggested changes. Added InMemoryHardLink as a subclass of InMemoryNode. Modified the tests accordingly.

Could we try removing Status from the base class and move it into InMemoryFile and InMemoryDir?
It shouldn't be too much work and would give safer interfaces for our new hierarchy.

lib/Basic/VirtualFileSystem.cpp
470

NIT: maybe use a name that follow the naming style for the other node kinds, i.e. IME_HardLink?

521

Can we get away without storing Name? toString() can just say "hard link to " + ResolveNode->toString()

524

Why do we need the Stat?
We can get it from ResolveNode if we want to store a copy. Any reason why it would be different?

524

ResolveNode can't be null, right? Maybe use a reference instead?
Also, maybe make it const?

530

Could we inline getBuffer and getStatus? Having the client code that's explicit about resolving the references seems like a good idea.

632

Why not just InMemoryNode*? Null would be the empty hardlink

775

NIT: maybe call the variable Link or HardLink?
Since it's not a file.

unittests/Basic/VirtualFileSystemTest.cpp
1054

Maybe also add a test that gets into hard links via directory iterators?
To make sure those give the same results as openFileForRead

usaxena95 marked 7 inline comments as done.Aug 30 2018, 7:52 AM

Could we try removing Status from the base class and move it into InMemoryFile and InMemoryDir?

Moved the Stat and getStatus into the subclasses. User of getStatus now calls getStatus of individual subclasses.

lib/Basic/VirtualFileSystem.cpp
521

Yeah we can do that. It is just for debug purpose only.

524

Changed it to reference. Can't make it const as the there it is involved in methods that are non const (like getResolvedFile returns a pointer, getBuffer is not const)

530

Removed these methods. Moved the logic to client code.

usaxena95 updated this revision to Diff 163330.Aug 30 2018, 7:54 AM
usaxena95 marked 3 inline comments as done.
  • Moved Stat to the subclasses of InMemoryNode
ilya-biryukov added inline comments.Aug 30 2018, 8:49 AM
include/clang/Basic/VirtualFileSystem.h
340
  • Maybe name it differently, e.g. ResolvedHardLink or HardLinkTarget?
  • Maybe document that how this function behaves for non-null vs null values (i.e. creates dirs/files vs links to files)?
358

There are still instances of HardLink. Maybe update them too?

359

Maybe document that it does nothing for directories?
Also, any reason to not support directories at this point? The limitation seem artificial at this point.

lib/Basic/VirtualFileSystem.cpp
485

Maybe store the result of llvm::sys::path::filename() instead of the full name in the first place?

524

Both methods could be made const, though, right?
It looks like the buffers are always copied when returned from the file system (because the VFS interface returns a unique_ptr<MemoryBuffer>).
getResolvedFile can also return a const node.

709

Why do we need to pass a buffer when creating a hard link?
Maybe just assert it's null?

725

Why not pass the buffer unchanged?

811

lookupInMemoryNode never returns link nodes, right? Maybe assert the link case is not possible?

846

NIT: add else?

848

NIT: add else?

unittests/Basic/VirtualFileSystemTest.cpp
701

Since gtest only reports one level of locations for the failing assertion, debugging test failures would be hard here (we won't see the actual test locations, the errors will always point into ExpectHardLink function.

We probably want to write a matcher and use EXPECT_THAT, so we can write something like

EXPECT_THAT(File, IsHardLinkTo(LinkTarget))

To write a matcher, we could use the gmock macros:

MATCHER_P(IsHardLinkTo, To, "") {
  llvm::StringRef From = arg;
  llvm::StringRef To = arg;
  // return true when matches, false otherwise
}
usaxena95 updated this revision to Diff 163549.Aug 31 2018, 9:26 AM
usaxena95 marked 15 inline comments as done.
  • Made suggested changes.

Made the suggested changes.

include/clang/Basic/VirtualFileSystem.h
359

Links to directories cannot be added with the current logic. Resolving paths and iterating directories becomes non trivial. Current implementation supports the use case of "file with multiple names" which is mostly sufficient to mimic the compilation.

lib/Basic/VirtualFileSystem.cpp
524

Since a raw pointer to the resolved file is returned by getResolvedNode (used by for lookup) , I have also changed some other raw pointer type to const pointer tpye.

Thanks, the interface and implementation look good!
Last drop of nits and suggestions for code simplification.

include/clang/Basic/VirtualFileSystem.h
359

You're right. E.g. one potential trouble is cycles in the directory tree.
Let's leave them out, directory links do seem like some hard work and we don't have a use-case for them.

361

InMemoryHardLink is a private implementation detail. Maybe remove it from the docs of the public API?

366

Maybe expand a little on what a "hard link" is in our implementation?
Specifically, the important bits seem to be:

They are not intended to be fully equivalent to hard links of the classical filesystems, despite the name.
Both the original file and a hard link have the same UniqueID returned in their stats, share the same buffer, etc.
There is no way to distinguish hard links and the original files after they were added.
(this is already in the docs) Only links to files are supported, directories cannot be linked.

367

Maybe remove these defaulted parameters? They're not used and we'll probably have confusing semantics even if we add support for them (since our hard links don't have their own status, and merely return the status of the original file).
WDYT?

lib/Basic/VirtualFileSystem.cpp
476

Please use std::string, we usually don't leave out the std:: namespace in LLVM.

525

Maybe return a reference here too? To indicate this can't be null.

654

HardLinkTarget can only be InMemoryFile, so maybe change the type of the parameter to const InMemoryFile*?
Clients will be responsible for making sure the target is actually a file, but that seems fine, they already have to resolve the path to the target file.

666

We don't use Status for links, maybe move creation of Status into the branch that handles files?
This would allow to assert Buffer is not null too

690

Oh, wow, this used to set the size of the created parent directories to the size of the buffer we're creating. I guess nobody looks at it anyway, so we were never hitting any problems because of this.
Good catch, thanks for fixing this.

805–812

NIT: maybe invert condition?
I.e. change to

if (!Node)
  return Node.getError();
//.. rest of the code

to keep the amount of code with higher nesting small. See somewhat related section in LLVM Style Guide

806

This line seems to have wrong indent.
Running git-clang-format before submitting the code is a convenient way to fix this.

811

NIT: add llvm_unreachable() to make it clear we don't have a 4th kind of node?

844–849

Maybe extract it into a small helper function?

It only pops up twice, but a helper should still give better readability for both cases.

unittests/Basic/VirtualFileSystemTest.cpp
717

s/IsHardLinktTo/IsHardLinkTo

720

This matcher is pretty complicated and tests multiple things.
If it fails, it might be hard to track down what exactly went wrong.

Given the name, it seems reasonable for it to simply test that both paths point to the same unique-id.
We don't have to test every other invariant (sizes are the same, buffers are the same) on all the links we create, one test assertion per invariant should be enough (sizes are the same, buffers are the same, etc)

usaxena95 marked 16 inline comments as done.Sep 3 2018, 7:11 AM

Applied suggested changes.

lib/Basic/VirtualFileSystem.cpp
654

Moved InMemoryFile from detail:<anonymous> namespace to detail namespace in order to be visible in VirtualFileSystem.h

unittests/Basic/VirtualFileSystemTest.cpp
720

Removed the check for Buffer and size from the matcher. Added the test for size and buffer in the smaller AddHardLinkTest

usaxena95 updated this revision to Diff 163713.Sep 3 2018, 7:12 AM
usaxena95 marked 2 inline comments as done.
  • applied suggested changes
usaxena95 updated this revision to Diff 163720.Sep 3 2018, 7:59 AM
  • Moved helper function into anonymous namespace
ilya-biryukov added inline comments.Sep 3 2018, 9:15 AM
lib/Basic/VirtualFileSystem.cpp
478

NIT: accept a parameter of type llvm::StringRef instead of const std::string&.

482

NIT: .str() is redundant (in fact, a pessimization) if parameter of InMemoryNode is a StringRef.

509

NIT: as noted in the other comment, .str() can be removed if parameter type is changed.

572–573

NIT: .str() can be removed

653

The comment looks outdated

654

Shouldn't we assert this instead? This is not an invalid input, this is actually breaking our program logic, right? I.e. one shouldn't call addFile with HardLinkTarget set and non-null buffer

675

NIT: .str() seems redundant

796

NIT: use llvm::cast<InMemoryFile> instead of static_cast.
It's essentially equivalent with added asserts.

unittests/Basic/VirtualFileSystemTest.cpp
979

Please use StringRefs everywhere. Twine should only be generally only be used as temporary objects, see the specific guidelines in Twine's docs.
Whenever a reference to an existing string is needed, a StringRef is the best choice.
If you need to create new strings (e.g. concatenation, etc.), use std::string.

981

s/content/Content

992

s/link0/Link0
.... and other lowerCamelCase vars --> UpperCamelCase

1095

This seems equivalent to EXPECT_THAT(Nodes, UnorderedElementsAre("/a", "/a/b", "/c", "/c/d"). Maybe simplify?

Note that iterators might give different file separators on windows. You might want to normalize, e.g. with llvm::sys::path::native(..., Style::posix).
And still watch out for windows buildbot failures, in case the root (/) is interpreted differently on Windows, possibly giving different results.

usaxena95 updated this revision to Diff 163739.Sep 3 2018, 10:58 AM
usaxena95 marked 14 inline comments as done.
  • applied changes

Applied the changes suggested.

lib/Basic/VirtualFileSystem.cpp
675

Here P is a Twine here which cannot be converted to StringRef

ilya-biryukov accepted this revision.Sep 4 2018, 1:53 AM

Thanks! LGTM with a last drop of NITs.
Do you already have commit access or do you want me to submit this change for you?

include/clang/Basic/VirtualFileSystem.h
366

NIT: since From should not exist, a comment that it shouldn't be a directory is redundant.

lib/Basic/VirtualFileSystem.cpp
484

NIT: .data() is redundant (in fact, a pessimization, since StringRef ctor will have to compute the length of the string), string can be implicitly converted to StringRef

615

NIT: else can be removed, same at the next statement. See the style guide

654

NIT: maybe put comment into an assertion message?
E.g. assert(!(HardLinkTarget && Buffer) && "HardLink cannot have a buffer")

669

NIT: end a comment with a full stop

675

You're right. Sorry for the confusion.

unittests/Basic/VirtualFileSystemTest.cpp
723

This is effectively a double negation, which is a bit hard to read.
Maybe replace !(a || b != c) to !a && b == c to avoid it?

This revision is now accepted and ready to land.Sep 4 2018, 1:53 AM
usaxena95 updated this revision to Diff 163785.Sep 4 2018, 4:42 AM
usaxena95 marked 9 inline comments as done.
  • fixed the style issues

I don't have commit access. Can you please submit this ?

I don't have commit access. Can you please submit this ?

Will do.

This revision was automatically updated to reflect the committed changes.

Submitted. Thanks for the change!