Page MenuHomePhabricator

[llvm-ar] Add output option for extract operation
Needs RevisionPublic

Authored by kongyi on Oct 24 2019, 5:20 PM.

Details

Summary

Both GNU ar or llvm-ar currently only supports extracting to the current working
directory. This creates a lot of complication when using it as part of build
process with both the llvm-ar tool path and the output file path are relative.
This change introduces --output option for extract operation, to coincide with
most other llvm-binutils tools.

Diff Detail

Event Timeline

kongyi created this revision.Oct 24 2019, 5:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 24 2019, 5:20 PM
kongyi updated this revision to Diff 226364.Oct 24 2019, 5:25 PM

This is a nice addition to llvm-ar as I discussed with Yi offline, but I want to hear from others.

llvm/test/tools/llvm-ar/extract.test
20

There will not be direct x operation tests with this change. Adding a new test is probably better. Please also test extracting more than one file.

This looks good to me, although with llvm-ar there is a balance between adding useful functionality and keeping command compatibility with other archivers. I've added Rui as a reviewer in case.

ruiu added a comment.Oct 25 2019, 6:45 AM

I think I'm leaning toward not adding this option at the moment for the compatibility reason. IIUC, llvm-ar was created because otherwise it would have been hard to create archives containing symbols from bitcode files, and except that feature, it was intended to be a drop-in replacement. If you land this patch and start depending on the feature, you can't easily go back to the other ar commands.

If llvm-ar is the most popular ar command, we can make a new option de-fact standard, but unfortunately that's not the case.

This option doesn't really depend on LLVM. If this feature is useful, this would be useful for other ar commands, and adding this option to a major ar command would maximizes the benefit. Maybe you should propose this option to GNU first?

MaskRay accepted this revision.Oct 25 2019, 9:44 AM

I think I'm leaning toward not adding this option at the moment for the compatibility reason. IIUC, llvm-ar was created because otherwise it would have been hard to create archives containing symbols from bitcode files, and except that feature, it was intended to be a drop-in replacement. If you land this patch and start depending on the feature, you can't easily go back to the other ar commands.

If llvm-ar is the most popular ar command, we can make a new option de-fact standard, but unfortunately that's not the case.

This option doesn't really depend on LLVM. If this feature is useful, this would be useful for other ar commands, and adding this option to a major ar command would maximizes the benefit. Maybe you should propose this option to GNU first?

I get a positive response from Nick Clifton https://sourceware.org/ml/binutils/2019-10/msg00186.html , so it is likely the option can be added to GNU ar as well. I'll have to figure out how to implement that, though...

This revision is now accepted and ready to land.Oct 25 2019, 9:44 AM

Not really opposed to this patch, but I think ruiu's comment makes sense to think about:

Maybe you should propose this option to GNU first?

+1 to this. It'd be good to broaden the audience to people that have been working with ar for longer.

Can you say more about the use case? A possible counter argument for the presented use case would be that if you need to extract your archive to a different directory, you should have just fetched it to the right place to begin with.

build process with both the llvm-ar tool path and the output file path are relative

Does the path to llvm-ar affect the extraction? I think paths are either interpreted relative to cwd or the archive itself, not the tool path (which is commonly /usr/bin/llvm-ar)

llvm/tools/llvm-ar/llvm-ar.cpp
549

I think llvm::sys::path::append() does this more generally (e.g. if outputFilePath is foo/ -> avoids foo//bar.a)

Can you say more about the use case? A possible counter argument for the presented use case would be that if you need to extract your archive to a different directory, you should have just fetched it to the right place to begin with.

The change is motivated by https://r.android.com/1146882. We need to run llvm-ar to extract and repack the same libgcc.a to multiple variants of it.

build process with both the llvm-ar tool path and the output file path are relative

Does the path to llvm-ar affect the extraction? I think paths are either interpreted relative to cwd or the archive itself, not the tool path (which is commonly /usr/bin/llvm-ar)

It does not affect the extraction, however without the option it creates a lot of hassle for projects which their build system supplies the path for a prebuilt llvm-ar relative to the project root directory.

srhines added inline comments.
llvm/tools/llvm-ar/llvm-ar.cpp
83

directory to extract to

kongyi updated this revision to Diff 226536.Oct 26 2019, 2:11 AM
kongyi marked 3 inline comments as done.

I posted a GNU ar patch yesterday https://sourceware.org/ml/binutils/2019-10/msg00193.html . If they accept the name --output, we can proceed with this patch.

ruiu added a comment.Oct 27 2019, 7:09 PM

I posted a GNU ar patch yesterday https://sourceware.org/ml/binutils/2019-10/msg00193.html . If they accept the name --output, we can proceed with this patch.

That's great! Thank you for doing that.

binutils 2.34 will include ar --output. https://sourceware.org/ml/binutils/2019-10/msg00271.html

llvm/test/tools/llvm-ar/extract.test
28

We probably also need a test that --output is after the extracted member. This also works. (e.g. ar x a.a a.txt --output=dir)

kongyi marked an inline comment as done.Oct 30 2019, 12:09 PM
kongyi added inline comments.
llvm/test/tools/llvm-ar/extract.test
28

The help message specifies that options has to be placed before the operation flags. Although we accept this, I don't think a test is necessary to assert the undefined behaviour.

MaskRay accepted this revision.Oct 30 2019, 3:00 PM
MaskRay added inline comments.
llvm/test/tools/llvm-ar/extract.test
28

I don't have more comments. Probably worth waiting a bit for another opinion.

For shell scripts that do llvm-ar x a.a $x, they probably should switch to llvm-ar x -- a.a "$x", which is safer if $x can be weird things like --output /home/user.

ruiu added a comment.Oct 31 2019, 1:44 AM

Thank you very much for adding it to GNU ar! Great work.

It looks like GNU ar doesn't allow to extract files to a parent directory, so that ar with the --output option wouldn't clobber users' files outside of the current directory. Maybe we should add the same safety harness?

Full disclosure: I'm coming from a system where there is only one ar tool, and using others isn't really supported. As such, I'm not bothered by adding new features to the tools that don't (yet) exist in GNU tools. If we do that more widely, we'll be held back in improving functionality by whatever the GNU community decide they want to accept. If we're going to start rejecting new options in llvm-ar, because GNU doesn't have/want them, we should start considering whether we should not be adding new options to llvm-objcopy/llvm-objdump/llvm-readelf etc, all of which I believe have options already that aren't in their GNU equivalents. Indeed, llvm-ar does too. It seems relatively unlikely to me that people will switch their binutils piecewise (although I could easily be wrong), so being concerned with one tool and not others seems inconsistent at least.

That all being said, I do agree that we should be careful with option naming. If it's not going to be accepted in other tools, we should be careful what we call the switch. "--output" would be an obvious example of a name that could clash, so discussing with other communities like GNU was the right course of action. I guess if they'd not wanted it, that shouldn't have stopped the option being added (if there was a good use-case for it), but it would have meant perhaps a different switch name that is less likely to clash in the future.

llvm/test/tools/llvm-ar/extract.test
28

I'd be wary about calling it undefined behaviour. It's common for usage to list options then inputs, but that doesn't mean specifying them the other way around is undefined in other tools. Indeed, it's not uncommon for people to do something like "llvm-readelf input.o --sections" or similar. If we want to require "--output" (and I'd add the other similar arguments for consistency too) be before the command, it should be more explicitly stated in both the help text and command guide. Speaking of which, please add the option to the llvm-ar CommandGuide documentation.

I'm relatively ambivalent as to whether this and other long options should be allowed afterwards. I'd lean towards not for llvm-ar because of its weird command-line rules. I also feel like this should be enforced and tested, to stop people relying on such behaviour. In other words, testing that --output after the archive name would attempt to add/extract etc a file called "--output". When we can easily avoid exposing undefined behaviour, I think we should.

ruiu added a comment.Nov 1 2019, 4:54 AM

Full disclosure: I'm coming from a system where there is only one ar tool, and using others isn't really supported. As such, I'm not bothered by adding new features to the tools that don't (yet) exist in GNU tools. If we do that more widely, we'll be held back in improving functionality by whatever the GNU community decide they want to accept. If we're going to start rejecting new options in llvm-ar, because GNU doesn't have/want them, we should start considering whether we should not be adding new options to llvm-objcopy/llvm-objdump/llvm-readelf etc, all of which I believe have options already that aren't in their GNU equivalents. Indeed, llvm-ar does too. It seems relatively unlikely to me that people will switch their binutils piecewise (although I could easily be wrong), so being concerned with one tool and not others seems inconsistent at least.

That all being said, I do agree that we should be careful with option naming. If it's not going to be accepted in other tools, we should be careful what we call the switch. "--output" would be an obvious example of a name that could clash, so discussing with other communities like GNU was the right course of action. I guess if they'd not wanted it, that shouldn't have stopped the option being added (if there was a good use-case for it), but it would have meant perhaps a different switch name that is less likely to clash in the future.

I agree with you that we shouldn't be too conservative. We should be able to innovate without worrying too much about compatibility with other tools. That being said, at the same time we should make a reasonable effort to keep tools compatible because that's good for users, so I think asking GNU people to add the same functionality made sense. In particular, I don't think we want llvm-ar (which is a drop-in replacement from the beginning) to diverge too much from other ar tools, and that's perhaps different from llvm-{objdump,readelf,etc} tools which originally had a completely different set of command line options from GNU and gradually gain GNU-compatible options. So, I wouldn't oppose to a new option, but I think we should ask other tools to add the same functionality (which MaskRay did very well for this case!)

I agree that inventing an option that could clash with future GNU addition should be avoided. We can ask GNU people whether they'd like to add the same functionality. Their rejection should not block our innovation, because we may still get a commitment from them that they will not intentionally use the same option for an incompatible feature. It seems the GNU ar commit as submitted by Nick Clifton only works for relative/dir not /absolute/dir or .. (my original one works for these case), I'll work with him to fix the problem, but in any case we can proceed with the llvm-ar change once the review concerns are all addressed.

llvm/test/tools/llvm-ar/extract.test
28

I'm relatively ambivalent as to whether this and other long options should be allowed afterwards.

To clarify, do we want to support the following forms

llvm-ar x --output=dir a.a  # An existing test archive-symtab.test does `llvm-ar rcs --format=bsd %t.a`
llvm-ar --output=dir x a.a

but not llvm-ar x a.a --output=dir?

jhenderson added inline comments.Nov 4 2019, 1:06 AM
llvm/test/tools/llvm-ar/extract.test
28

I don't have strong feelings on it, but if I were to cast a vote, I'd go go with that suggestion (not allow --output=dir afterwards). My reasoning is that not allowing llvm-ar x a.a --output=dir may in fact be easier in some ways: how else do you pass in filenames starting with - after all? The example you gave above of the need to switch to adding -- is an example of why it may be better to not to allow it.

Like I said though, no strong feelings, so if others prefer a different approach, that's cool with me.

ruiu added a comment.Nov 5 2019, 6:45 AM

So, I think Nick Clifton's modification to not clobber parent directories is a good safeguard, and we should implement the same thing to protect our users. What do you think?

So, I think Nick Clifton's modification to not clobber parent directories is a good safeguard, and we should implement the same thing to protect our users. What do you think?

I think disallowing .. and absolute paths of archive members is sufficient. Both --output=../dir and --output=/abs/path can be allowed. The rationale is that --output is and should be controlled by the developer, while paths of archive members are more or less uncontrollable.

We may have already disallowed .. and absolute paths of archive members: error: truncated or malformed archive (string table at long name offset 0not terminated), but the pointer arithmetic in lib/Object/Archive.cpp is a bit complex and I cannot confirm for now.

Another question is about the output of llvm-ar --output=/tmp/c xv a.a if a.a contains file. I think we should probably print x - /tmp/c/file instead of x - file. The v modifier hasn't been implemented for the x operation, though.

llvm/test/tools/llvm-ar/extract.test
28

Not allowing llvm-ar x a.a --output=dir looks good to me. It provides a bit more safeguard (llvm-ar x a.a "$member" will not be vulnerable if $member expands to --output=/home/user)

ruiu added a comment.Nov 5 2019, 6:35 PM

So, I think Nick Clifton's modification to not clobber parent directories is a good safeguard, and we should implement the same thing to protect our users. What do you think?

I think disallowing .. and absolute paths of archive members is sufficient. Both --output=../dir and --output=/abs/path can be allowed. The rationale is that --output is and should be controlled by the developer, while paths of archive members are more or less uncontrollable.

We may have already disallowed .. and absolute paths of archive members: error: truncated or malformed archive (string table at long name offset 0not terminated), but the pointer arithmetic in lib/Object/Archive.cpp is a bit complex and I cannot confirm for now.

Oh OK. Do we have a test for that? I'd make sure that ar x --output=/foo/bar/ baz.a report an error instead of creating /foo/bar.o if bar.a contains ../bar.o.

kongyi added a comment.EditedNov 6 2019, 7:50 AM

I'll create a separate patch to address the argument ordering comment.

ruiu added a comment.Nov 6 2019, 11:21 PM

I'll create a separate patch to address the argument ordering comment.

It is not clear to me whether my concern has already been addressed (and only testcase is missing) or is not implemented. Which is the case? If it's not implemented, maybe we should do that as part of this patch to reduce the risk of getting this feature in to the next release without the safeguard?

MaskRay added a comment.EditedNov 7 2019, 10:19 AM

I think we now need to do:

  • Change the command line parser. llvm-ar x --output=/tmp a.a and llvm-ar --output=dir x a.a (options around the operation) should work, but llvm-ar x a.a --output=dir (options after the archive) should fail.
  • When --output=/tmp/c is specified, let llvm-ar xv a.a print x - /tmp/c/a.o (this is now the GNU ar behavior), instead of x - a.o. In the test, we may need to use --output=%t in the RUN line and x - [[DIR]]/a.o in the CHECK line (FileCheck %s -DDIR=%t; / may be \ on Windows).
  • I think members with .. or absolute paths are already excluded, but I may be wrong. Rejecting .. and absolute paths, and giving a friendly error in a separate patch should be fine.
MaskRay requested changes to this revision.Nov 7 2019, 10:20 AM
This revision now requires changes to proceed.Nov 7 2019, 10:20 AM