This is an archive of the discontinued LLVM Phabricator instance.

[llvm-ar] Implement the P modifier.
ClosedPublic

Authored by rupprecht on Feb 7 2019, 2:20 PM.

Details

Summary

GNU ar has a P modifier that changes filename comparisons to use full paths instead of the basename. As noted in the GNU docs, regular archives are not created with full path names, so P is used to deal with archives created by other archive programs (e.g. see the updated absolute-paths.test test case).

Since thin archives use full path names -- paths are relative to the archive -- it seems very error prone to not imply P when dealing with thin archives, so P is implied in those cases. (I think this is a deviation from GNU ar that makes sense).

This fixes PR37436 via https://github.com/ClangBuiltLinux/linux/issues/33.

Event Timeline

rupprecht created this revision.Feb 7 2019, 2:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 7 2019, 2:20 PM
ruiu added a comment.Feb 7 2019, 2:37 PM

I played with ar and found that the following behavior is pretty odd.

Create an archive

$ mkdir x
$ echo 123 > x/foo.txt
$ ar rcS archive.a x/foo.txt

Directory name is not stored to archive.a

$ cat archive.a
!<arch>
foo.txt/ 0 0 0 644 4 `
123

But you can extract foo.txt with any directory name

$ ar x archive.a bar/foo.txt
$ cat foo.txt
123

If I understand correctly, ar by default ignores directory names in many situations like the above, and "P" flag is to make it to not ignore directory names in a pathname. It looks like the default behavior is broken. Is that important for compatibility? I wonder if there's a chance to make "P" the default behavior.

I played with ar and found that the following behavior is pretty odd.

Create an archive

$ mkdir x
$ echo 123 > x/foo.txt
$ ar rcS archive.a x/foo.txt

Directory name is not stored to archive.a

$ cat archive.a
!<arch>
foo.txt/ 0 0 0 644 4 `
123

But you can extract foo.txt with any directory name

$ ar x archive.a bar/foo.txt
$ cat foo.txt
123

If I understand correctly, ar by default ignores directory names in many situations like the above, and "P" flag is to make it to not ignore directory names in a pathname. It looks like the default behavior is broken. Is that important for compatibility? I wonder if there's a chance to make "P" the default behavior.

Yes, the more I dug into pathnames, the more confusing it got... just to clarify, directory names are never stored in the non-thin archive format; the P option is only for whether full path names are used for comparisons (e.g. reading/extracting a single member from an archive, deleting a single member from an archive, etc.)

Your example is odd, but I don't think it would be invoked this way. Although making P the default would make your example make sense, the counterexample would be confusing in a more normal case:

$ mkdir x
$ echo 123 > x/foo.txt
$ ar rcSP archive.a x/foo.txt

# Directory name still not stored even though P is present
$ cat archive.a
!<arch>
foo.txt/        0           0     0     644     4         `
123

# The file we just added is there, right?
$ ar tP archive.a x/foo.txt
no entry x/foo.txt in archive

# Nope, we have to chop off the directory ourselves to find it.
$ ar tP archive.a foo.txt
foo.txt
ruiu accepted this revision.Feb 7 2019, 3:13 PM

LGTM

Hmm, that's right. That's probably a good reason why we can't make P the default behavior.

ar doesn't store directory names to an archive file, but it tries to hide that fact from the user in some degree, but the mismatch leaks out in many situations like the above. Probably, ar shouldn't have tried to do that from the beginning, but we are perhaps 50+ years too late to fix it.

This revision is now accepted and ready to land.Feb 7 2019, 3:13 PM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Feb 20 2019, 5:06 PM

Hmm, that's right. That's probably a good reason why we can't make P the default behavior.

Doesn't that make the output dependent on the build directory path? In Chromium, we're very careful to try and make the build independent of the build directory name for deterministic builds. I think changing the default here is bad for this reason. If you have your mind set on this however, is there an opt out flag? (We don't use llvm-ar in chromium at the moment so this doesn't affect us, but it's maybe a concern you didn't think about, so I figured I'd mention it.)

Hmm, that's right. That's probably a good reason why we can't make P the default behavior.

Doesn't that make the output dependent on the build directory path? In Chromium, we're very careful to try and make the build independent of the build directory name for deterministic builds. I think changing the default here is bad for this reason. If you have your mind set on this however, is there an opt out flag? (We don't use llvm-ar in chromium at the moment so this doesn't affect us, but it's maybe a concern you didn't think about, so I figured I'd mention it.)

The member paths in thin archives are relative to the archive itself. So suppose you have two build trees and you want to cmp the difference of every file in each tree to make sure they're bit-for-bit identical. Won't the archive members be the same relative path to the archive within each build tree? If so, the thin archive contents will be bit-for-bit identical.

Hmm, that's right. That's probably a good reason why we can't make P the default behavior.

Doesn't that make the output dependent on the build directory path? In Chromium, we're very careful to try and make the build independent of the build directory name for deterministic builds. I think changing the default here is bad for this reason. If you have your mind set on this however, is there an opt out flag? (We don't use llvm-ar in chromium at the moment so this doesn't affect us, but it's maybe a concern you didn't think about, so I figured I'd mention it.)

The member paths in thin archives are relative to the archive itself. So suppose you have two build trees and you want to cmp the difference of every file in each tree to make sure they're bit-for-bit identical. Won't the archive members be the same relative path to the archive within each build tree? If so, the thin archive contents will be bit-for-bit identical.

Ah, "full path" doesn't mean "absolute path" here. Then it's all good, thanks!