Page MenuHomePhabricator

[ELF] Don't consider directories of the same name as libraries
AbandonedPublic

Authored by phosek on Jan 21 2022, 3:23 PM.

Details

Reviewers
MaskRay
mcgrathr
Summary

When processing linker scripts or dependent libraries, if there's a
directory of the same name as the library being searched for, either in
the current directory or earlier in the search order, LLD will try to
open it and report an error. This is because LLD currently uses file
existence check. While we could expand this check and verify that the
file type is not a directory, such a check could be susceptible to
races. Rather, we always try to open the file, but silently ignore the
error and continue with other cases.

Diff Detail

Event Timeline

phosek created this revision.Jan 21 2022, 3:23 PM
phosek requested review of this revision.Jan 21 2022, 3:23 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 21 2022, 3:23 PM

I'm not yet sure if this is the best solution, but it's one possible solution to an issue we recently discovered.

mcgrathr accepted this revision.Jan 21 2022, 4:06 PM
mcgrathr added a subscriber: mcgrathr.

lgtm

lld/test/ELF/deplibs-dir.s
14

Another good thing to test here is -L%t.dir/notdir/foo when %t.dir/notdir exists as a regular file. With the old bug this will get "Not a directory" (ENOTDIR, which should also be treated the same as ENOENT).

Ideally, I think we'd also test that a regular file that cannot be opened fails as expected rather than being skipped. That would need two matching files in the search path to be sure it doesn't go on and open the file that works. e.g., make the first match in the path be a file without read permissions. It might be hard to reproduce an existing but unopenable file across all operating systems.

18

Matching the exact perror text might not be reliable across operating systems, but for a negative match that might be tolerable. Also of potential concern is if messages could ever get translated vs running under LC_ALL=C.

This revision is now accepted and ready to land.Jan 21 2022, 4:06 PM

ignoreError can renamed to something like skipDirectory.

Can you check whether the error_code is an EISDIR?

MaskRay requested changes to this revision.EditedJan 21 2022, 8:19 PM
MaskRay added subscribers: bd1976llvm, ikudrin.

The code move to Driver.h feels a bit uneasy, hope someone may suggest whether there is a better way.

@arichardson @bd1976llvm @ikudrin

The functionality looks good and I am happy to LG if we can't think of a better way.

lld/ELF/Driver.h
97

This change should be made separately.

107

return func(name) || findFromSearchPaths(name, func);

This revision now requires changes to proceed.Jan 21 2022, 8:19 PM

The code move to Driver.h feels a bit uneasy, hope someone may suggest whether there is a better way.

@arichardson @bd1976llvm @ikudrin

The functionality looks good and I am happy to LG if we can't think of a better way.

I also considered keeping the functions in DriverUtils.cpp and change the argument to take std::function<bool(StringRef)>. It is less efficient than the template argument, but these functions aren't on a critical path so that's probably not an issue. Would that be preferable?

phosek updated this revision to Diff 402417.Jan 24 2022, 12:50 AM
phosek marked 2 inline comments as done.
gulfem added a subscriber: gulfem.Jan 24 2022, 9:03 AM

The code move to Driver.h feels a bit uneasy, hope someone may suggest whether there is a better way.

@arichardson @bd1976llvm @ikudrin

The functionality looks good and I am happy to LG if we can't think of a better way.

I also considered keeping the functions in DriverUtils.cpp and change the argument to take std::function<bool(StringRef)>. It is less efficient than the template argument, but these functions aren't on a critical path so that's probably not an issue. Would that be preferable?

Function definitions remaining in .cpp looks fine. You may use llvm::function_ref<bool(llvm::StringRef)> instead of std::function (which needs heap allocation).

phosek updated this revision to Diff 402588.Jan 24 2022, 10:39 AM
phosek updated this revision to Diff 402600.Jan 24 2022, 11:11 AM
MaskRay added a comment.EditedJan 24 2022, 1:22 PM

Currently a deplib name (${name}) indicates either the literal ${name} or the -l style lib${name}.{a,so}.
We have a conflict if ${name} is foo and there is a directory named foo.

What if we do

if (name.endswith(".a") || name.endswith(".so"))
  findFromSearchPaths
else
  searchLibraryBaseName

?

That should avoid the conflict, too. A directory is unlikely named .so or .a.


I think when the deplib extension was added, I got slightly uneasy with it attempted both findFromSearchPaths and searchLibraryBaseName. And you raised a justifying example which makes users confused...

lld/ELF/InputFiles.cpp
119

Remove ENOTDIR. It is for diagnostics when the file type is expected to be a directory.

We may need to check whether category() is https://en.cppreference.com/w/cpp/error/system_category

lld/ELF/ScriptParser.cpp
319

The comment says skipDirectory but it is actually used for sys::fs::exists(path). The semantic is a bit different which may suggest skipDirectory is not suitable.

Currently a deplib name (${name}) indicates either the literal ${name} or the -l style lib${name}.{a,so}.
We have a conflict if ${name} is foo and there is a directory named foo.

What if we do

if (name.endswith(".a") || name.endswith(".so"))
  findFromSearchPaths
else
  searchLibraryBaseName

?

That should avoid the conflict, too. A directory is unlikely named .so or .a.

I don't think introducing more magical rules about naming patterns ever makes things easier.

It is surprising at first blush that deplibs searches for "foo" and not just what "-lfoo" would do, but it makes sense if you want to allow saying "libfoo.a" in deplibs to avoid matching "libfoo.so" in arcane cases.
So I think just the fix here to make all cases of not-existing-files candidates on search paths be ignored robustly is sufficient. The extra set of candidates that deplibs cases will attempt is tolerable and IMHO it's far preferable to having any more magic in the semantics of the behavior than there already is

lld/ELF/InputFiles.cpp
119

This is inaccurate. ENOTDIR is also an expected error for certain cases of a nonexistent path, for example foo/bar/baz when foo/bar is an existing non-directory file is expected to be ignored and not stop continuing to search the next path candidate. The only case when open without O_DIRECTORY returns ENOTDIR is a case that should be treated as equivalent to ENOENT for this use.

Currently a deplib name (${name}) indicates either the literal ${name} or the -l style lib${name}.{a,so}.
We have a conflict if ${name} is foo and there is a directory named foo.

What if we do

if (name.endswith(".a") || name.endswith(".so"))
  findFromSearchPaths
else
  searchLibraryBaseName

?

That should avoid the conflict, too. A directory is unlikely named .so or .a.

I don't think introducing more magical rules about naming patterns ever makes things easier.

It is surprising at first blush that deplibs searches for "foo" and not just what "-lfoo" would do, but it makes sense if you want to allow saying "libfoo.a" in deplibs to avoid matching "libfoo.so" in arcane cases.
So I think just the fix here to make all cases of not-existing-files candidates on search paths be ignored robustly is sufficient. The extra set of candidates that deplibs cases will attempt is tolerable and IMHO it's far preferable to having any more magic in the semantics of the behavior than there already is

It's not adding more magical rules. The chained findFromSearchPaths and searchLibraryBaseName is replaced with testing just one form.
We can then get rid of llvm::function_ref<bool(StringRef)>.

Currently a deplib name (${name}) indicates either the literal ${name} or the -l style lib${name}.{a,so}.
We have a conflict if ${name} is foo and there is a directory named foo.

What if we do

if (name.endswith(".a") || name.endswith(".so"))
  findFromSearchPaths
else
  searchLibraryBaseName

?

That should avoid the conflict, too. A directory is unlikely named .so or .a.

I don't think introducing more magical rules about naming patterns ever makes things easier.

It is surprising at first blush that deplibs searches for "foo" and not just what "-lfoo" would do, but it makes sense if you want to allow saying "libfoo.a" in deplibs to avoid matching "libfoo.so" in arcane cases.
So I think just the fix here to make all cases of not-existing-files candidates on search paths be ignored robustly is sufficient. The extra set of candidates that deplibs cases will attempt is tolerable and IMHO it's far preferable to having any more magic in the semantics of the behavior than there already is

It's not adding more magical rules. The chained findFromSearchPaths and searchLibraryBaseName is replaced with testing just one form.
We can then get rid of llvm::function_ref<bool(StringRef)>.

I would rather keep the search scheme as simple and general as possible. The ".a" and ".so" suffixes are just conventions (not part of the ELF standard). My hope was to make the autolinking scheme implementable by any ELF linker.

LGTM to the proposed fix from me. If memory serves, I experimented with the same approach (as proposed here) when autolinking was first implemented; however, I decided to make the code as minimal as possible and see what fell out from real world usage as It wasn't clear to me what problems build systems would be able to workaround vs which would be blockers.

MaskRay added a comment.EditedJan 24 2022, 4:08 PM

I would rather keep the search scheme as simple and general as possible. The ".a" and ".so" suffixes are just conventions (not part of the ELF standard). My hope was to make the autolinking scheme implementable by any ELF linker.

LGTM to the proposed fix from me. If memory serves, I experimented with the same approach (as proposed here) when autolinking was first implemented; however, I decided to make the code as minimal as possible and see what fell out from real world usage as It wasn't clear to me what problems build systems would be able to workaround vs which would be blockers.

If we want to keep minimum code, we can just swap searchLibraryBaseName and findFromSearchPaths branches.

I'd really liked that the feature started with -lfoo (instead of foo) in the first place if it were augmented with the ability to link an arbitrary absolute/relative path, since that is unambiguous.
Perhaps the ship has sailed now.

Have we reached a decision on this? I'd like to land this change since it's currently blocking a Clang roll for us.

Have we reached a decision on this? I'd like to land this change since it's currently blocking a Clang roll for us.

Just swap searchLibraryBaseName and findFromSearchPaths branches. It's the simplest/least intrusive fix.

Have we reached a decision on this? I'd like to land this change since it's currently blocking a Clang roll for us.

Just swap searchLibraryBaseName and findFromSearchPaths branches. It's the simplest/least intrusive fix.

That alone is not sufficient if the directory with the same name exists in the current directory as is the case in the attached test case, since we'll take this branch https://github.com/llvm/llvm-project/blob/249a21ab188419b019adea21716f5c779c063de3/lld/ELF/InputFiles.cpp#L493 and fail.

Have we reached a decision on this? I'd like to land this change since it's currently blocking a Clang roll for us.

Just swap searchLibraryBaseName and findFromSearchPaths branches. It's the simplest/least intrusive fix.

That alone is not sufficient if the directory with the same name exists in the current directory as is the case in the attached test case, since we'll take this branch https://github.com/llvm/llvm-project/blob/249a21ab188419b019adea21716f5c779c063de3/lld/ELF/InputFiles.cpp#L493 and fail.

OK. Then move searchLibraryBaseName to the top. I think the comment https://reviews.llvm.org/D117933#3267443 is also relevant to decrease the number of filesystem probing.

phosek abandoned this revision.Feb 8 2022, 1:26 AM

We went with D118498 instead.