This is an archive of the discontinued LLVM Phabricator instance.

[llvm-libtool-darwin] Print a warning if object file names are repeated
ClosedPublic

Authored by Roger on Nov 3 2021, 11:03 AM.

Details

Summary

Print a warning if llvm-libtool-darwin if any of the object
files provided by the user have the same file name.

The tool will now print a warning if there is a name collision across:

  • Two object files
  • An object file and an object file from within a static library
  • Two object files from different static libraries

Here is an example of the error:

$ llvm-libtool-darwin -static -o archive.a out.o out.o
error: file 'out.o' was specified multiple times.
in: out.o
in: out.o

$ llvm-libtool-darwin -static -o archive.a out.o
$ llvm-libtool-darwin -static -o combined.a archive.a out.o
error: file 'out.o' was specified multiple times.
in: archive.a
in: out.o

This change mimics apple's cctools libtool's behavior which always shows a warning in such cases.

Diff Detail

Event Timeline

Roger requested review of this revision.Nov 3 2021, 11:03 AM
Roger created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptNov 3 2021, 11:03 AM
Roger edited the summary of this revision. (Show Details)Nov 3 2021, 11:05 AM

This change does not allow static libraries to be created if any of the object files in that library would have the same file name.

This should be "This change prevents creation of static libraries if any ...". The phrase "does not allow" suggests that the patch does something, but does not make it possible for the static libraries to be created in this situation (implying that a future change would make it possible).

Note that this apple's cctools libtool allows such static libraries to be created and only throws a warning about the name collision.

I'm not an Apple/Mach-O developer, and don't really know much, if anything, about the platform or the file format, so take my thoughts with a pinch of salt. However, I wanted to question this:

  1. For traditional Unix static libraries, the filenames are largely irrelevant - you can have multiple library members without any issue and it'll just work at link time. This is because such library members are picked based on their symbol names, rather than anything to do with their filenames.
  2. If the existing cctools libtool reports a warning rather than an error, why not do the same here? It seems like reporting an error might cause problems where existing users are creating such a library for good reason. NB: if there is no possible good reason, i.e. the library is fundamentally broken, then an error makes sense - see my above point about not being too familiar with the format.
llvm/test/tools/llvm-libtool-darwin/L-and-l.test
75

We often add extra indentation to make output line up nicely after FileCheck directives of different lengths.

Same in the other tests.

98

%T is generally considered deprecated and may be removed at a future point. This is because the directory is not unique to the test, so there could be mysterious runtime failures due to test ordering issues.

To workaround this, cd into a test-time created directory, with a name derived from %t, and then use paths relative to that directory.

llvm/test/tools/llvm-libtool-darwin/archive-flattening.test
62
66

I'd recommend including the "error:" prefix here too.

llvm/test/tools/llvm-libtool-darwin/create-static-lib.test
54–55
llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp
116

Up to you, but I think for STL-like containers, it's acceptable to follow STL-naming conventions with their methods (i.e. push_back here). Others may disagree though :)

540–542

I'd prefer not to use auto here. In fact, for Members and Files, I believe you could (should?) be using ArrayRef<>

543

Is there a reason this is a) ordered and b) using a std::map rather than LLVM's StringMap?

544

Iterators or simply Its (always use UpperCamelCase for variables).

546

LLVM comments are always full sentences ("This should imply iterators.second != Files.cend()."). I'd suggest using C-style block comments too, so that you aren't forced to put this on a new line. Finally, I'd drop the "should" (i.e. "This implies...".

You might want to add an assertion to sanity check that iterators.second != Files.cend()

547

Throughout this function: LLVM style says no braces for single line ifs and loops.

553

Probably rename this to MemberToFiles, rather than the rather meaningless "pair".

557

LLVM coding standards say that error messages shouldn't end in a .

581–583
thevinster added a comment.EditedNov 4 2021, 11:25 PM

I agree with jhenderson@ here. I'm not sure why we want to error on this when it's only warning for cctools libtool. This is a backwards incompatible change for those who want to do a drop in replacement of the tool.

llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp
109–112

These vectors feel a bit strange to me. It almost feels like NewArchiveMember should either support filename in the struct or a much less invasive change is to have this class inherit NewArchiveMember and support the filename field here. We can always downcast it if methods elsewhere require the NewArchiveMember object.

alexander-shaposhnikov requested changes to this revision.Nov 5 2021, 12:45 AM
This revision now requires changes to proceed.Nov 5 2021, 12:45 AM
Roger added inline comments.Nov 6 2021, 12:25 PM
llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp
109–112

The main reason why I went with this approach is because other functions writeArchive and writeArchiveToBuffers depend on getting a std::vector<NewArchiveMember> as an input. If that were not the case, I would have done something like

struct NewArchiveMemberWrapper {
  NewArchiveMember Member;
  StringRef File;
};

and used std::vector<NewArchiveMemberWrapper> instead of std::vector<NewArchiveMember>.

It almost feels like NewArchiveMember should either support filename in the struct or a much less invasive change is to have this class inherit NewArchiveMember and support the filename field here.

I intentionally avoid using inheritance here because, fundamentally, I am not trying to create another type of NewArchiveMember. I am just trying to tie an additional piece of data (a file path) to each NewArchiveMember. Also, if I created a child class (let's call it NewArchiveMemberChild) and used std::vector<NewArchiveMemberChild>, then I would have to do extra work to generate the std::vector<NewArchiveMember> that the writeArchive and writeArchiveToBuffer functions need. Using composition here is simpler and I believe more appropriate.

I also did not add a new StringRef field to NewArchiveMember because the feature I am introducing this field for is specific to llvm-libtool-darwin but NewArchiveMember is not (it is shared among multiple libraries). Adding a new field to NewArchiveMember would be leaking state/complexity relevant only to llvm-libtool-darwin out to other libraries.

These vectors feel a bit strange to me.

I will admit one thing I do not like is that the Members vector and the Files vector are programmed to always have the same size but this invariant is not implied by their types. If we had one vector of NewArchiveMemberWrapper, then this wouldn't be an issue but, again, the existing code does not make this easy. I think it might be do-able if we had the ranges-v3 library or C++20's ranges but we do not :(

Roger updated this revision to Diff 385306.Nov 6 2021, 5:10 PM

Rebasing.

Roger updated this revision to Diff 385310.Nov 6 2021, 5:26 PM

Rebasing

alexander-shaposhnikov requested changes to this revision.Nov 6 2021, 6:32 PM

Before reviewing the details I'd like to better understand the rationale for the proposed changes (especially given the discrepancy with cctools etc), it'd be good to include it into the summary of the diff.

This revision now requires changes to proceed.Nov 6 2021, 6:32 PM
Roger updated this revision to Diff 385364.Nov 7 2021, 12:33 PM

Change this feature from throwing an error to printing a warning.

Roger updated this revision to Diff 385369.Nov 7 2021, 1:36 PM
Roger marked 2 inline comments as done.

Fix and address comments in tests.

Roger updated this revision to Diff 385371.Nov 7 2021, 1:39 PM
Roger marked 3 inline comments as done.

Fix comment.

llvm/test/tools/llvm-libtool-darwin/create-static-lib.test
54–55

I am just removing that line since I'm changing this behavior to print an error.

Roger planned changes to this revision.Nov 7 2021, 1:39 PM

This change was motivated by an issue observed internally where llvm-libtool silently accepting two input files with the same basename and putting them into the same archive led to a crash. This was ultimately down to an ODR violation on our end, but we noticed that cctools' libtool gives a warning in the same scenario, which at least gives the user some hint of what might be going on.

We originally thought of an emitting an error instead of a warning because we couldn't think of a good reason you'd want two files with the same basename in the same archive (it also results in confusing linker diagnostics, ar extractions, etc.), and because warnings are super easy to miss in build output. However, there might be cases where this situation arises naturally (especially with third-party archives out of your control), so matching cctools behavior and emitting a warning instead of an error does make more sense (and we could always add an option to upgrade the warning to an error if we desired).

Roger updated this revision to Diff 385559.Nov 8 2021, 10:35 AM

Fix test

Roger planned changes to this revision.Nov 8 2021, 11:15 AM

This change was motivated by an issue observed internally where llvm-libtool silently accepting two input files with the same basename and putting them into the same archive led to a crash. This was ultimately down to an ODR violation on our end, but we noticed that cctools' libtool gives a warning in the same scenario, which at least gives the user some hint of what might be going on.

We originally thought of an emitting an error instead of a warning because we couldn't think of a good reason you'd want two files with the same basename in the same archive (it also results in confusing linker diagnostics, ar extractions, etc.), and because warnings are super easy to miss in build output. However, there might be cases where this situation arises naturally (especially with third-party archives out of your control), so matching cctools behavior and emitting a warning instead of an error does make more sense (and we could always add an option to upgrade the warning to an error if we desired).

Thanks fo the explanation, it makes sense to me.

Roger updated this revision to Diff 385913.Nov 9 2021, 11:42 AM

Rebase to new parent

Roger planned changes to this revision.Nov 9 2021, 12:06 PM
Roger planned changes to this revision.Nov 9 2021, 12:29 PM
Roger updated this revision to Diff 385942.Nov 9 2021, 12:45 PM

Stop manually passing FileName around.

Roger planned changes to this revision.Nov 9 2021, 12:46 PM
Roger retitled this revision from [llvm-libtool-darwin] Throw an error if object file names are repeated to [llvm-libtool-darwin] Print a warning if object file names are repeated.
Roger updated this revision to Diff 385981.Nov 9 2021, 2:40 PM

Update asserts

Roger planned changes to this revision.Nov 9 2021, 2:41 PM
Roger updated this revision to Diff 385984.Nov 9 2021, 2:44 PM

Clang format

Roger planned changes to this revision.Nov 9 2021, 2:45 PM
Roger updated this revision to Diff 386301.Nov 10 2021, 1:30 PM

Small changes

Roger planned changes to this revision.Nov 10 2021, 1:39 PM
Roger edited the summary of this revision. (Show Details)Nov 10 2021, 2:53 PM
Roger edited the summary of this revision. (Show Details)
Roger updated this revision to Diff 386328.Nov 10 2021, 2:56 PM
Roger marked 9 inline comments as done.

Respond to comments

Roger updated this revision to Diff 386329.Nov 10 2021, 2:59 PM

Addressing comments

Roger updated this revision to Diff 386332.Nov 10 2021, 3:05 PM

Formatting

Roger added inline comments.Nov 10 2021, 3:47 PM
llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp
540–542

Done.

543

You're right I shouldn't use an ordered map. I didn't know about StringMap (am very new to the LLVM codebase). I'll use it :)

Roger added a comment.Nov 10 2021, 3:54 PM

I'm not an Apple/Mach-O developer, and don't really know much, if anything, about the platform or the file format, so take my thoughts with a pinch of salt. However, I wanted to question this:

  1. For traditional Unix static libraries, the filenames are largely irrelevant - you can have multiple library members without any issue and it'll just work at link time. This is because such library members are picked based on their symbol names, rather than anything to do with their filenames.
  2. If the existing cctools libtool reports a warning rather than an error, why not do the same here? It seems like reporting an error might cause problems where existing users are creating such a library for good reason. NB: if there is no possible good reason, i.e. the library is fundamentally broken, then an error makes sense - see my above point about not being too familiar with the format.

I changed the behavior to output a warning rather than throw an error for the reasons provided above

Some nits, but I haven't done a detailed review, and am leaving that to those actually invested in this tool.

llvm/test/tools/llvm-libtool-darwin/archive-flattening.test
69

Missing warning: prefix.

llvm/test/tools/llvm-libtool-darwin/create-static-lib.test
66
llvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp
559–561

Nit: delete braces for single-line loop.

Roger updated this revision to Diff 386589.Nov 11 2021, 10:49 AM
Roger marked 3 inline comments as done.

Address comments

llvm/test/tools/llvm-libtool-darwin/archive-flattening.test
69

This stanza is all part of one warning message so only the first line will have the prefix.

smeenai accepted this revision.Dec 1 2021, 2:33 PM

LGTM, thanks!

Roger updated this revision to Diff 399078.Jan 11 2022, 2:20 PM

Insert missing period from warning.

This revision was not accepted when it landed; it landed in state Needs Review.Jan 11 2022, 2:50 PM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Jan 11 2022, 7:20 PM

The test fails on Windows: http://45.33.8.238/win/52360/step_11.txt

Please take a look, and revert for now if it takes a while to fix.

Thanks for the heads up. We'll need to account for forward vs. back slashes in the test. I reverted for now and will recommit with a fix.

The Windows test failure here is pretty annoying :(

For the test where we specify one file via -l and the other via its path, the output on Windows looks like

warning: file 'L-and-l.test.tmp-input1.o' was specified multiple times.
in: C:\Users\smeenai\llvm-project\build\Release\test\tools\llvm-libtool-darwin\Output\L-and-l.test.tmp/copy/L-and-l.test.tmp-input1.o
in: C:\Users\smeenai\llvm-project\build\Release\test\tools\llvm-libtool-darwin\Output\L-and-l.test.tmp/copy\L-and-l.test.tmp-input1.o

Note that the last slash is a forward slash in one path and a backslash in the other. The forward slash is coming from our path specification in the test, and the backslash is coming from sys::path::append in libtool (to concatenate -L directories and -l names).

One way to fix this would be to introduce a new %{sep} substitution in lit (similar to the existing %{pathsep}), so we could use the appropriate slashes for the platform in our test automatically. Another would be to use sys::path::Style::windows_slash in our sys::path::append call, but that could break with \\?\ paths, which can't use forward slashes.

@mstorsjo, I believe you have a bunch of experience dealing with slash differences on Windows. Any suggestions on the best way to proceed here?

For the test where we specify one file via -l and the other via its path, the output on Windows looks like

warning: file 'L-and-l.test.tmp-input1.o' was specified multiple times.
in: C:\Users\smeenai\llvm-project\build\Release\test\tools\llvm-libtool-darwin\Output\L-and-l.test.tmp/copy/L-and-l.test.tmp-input1.o
in: C:\Users\smeenai\llvm-project\build\Release\test\tools\llvm-libtool-darwin\Output\L-and-l.test.tmp/copy\L-and-l.test.tmp-input1.o

Note that the last slash is a forward slash in one path and a backslash in the other. The forward slash is coming from our path specification in the test, and the backslash is coming from sys::path::append in libtool (to concatenate -L directories and -l names).

One way to fix this would be to introduce a new %{sep} substitution in lit (similar to the existing %{pathsep}), so we could use the appropriate slashes for the platform in our test automatically. Another would be to use sys::path::Style::windows_slash in our sys::path::append call, but that could break with \\?\ paths, which can't use forward slashes.

@mstorsjo, I believe you have a bunch of experience dealing with slash differences on Windows. Any suggestions on the best way to proceed here?

Hmm... Overall I'm not a huge fan of tests like FileCheck -DSOMETHING=%t/suffix or similar, because as we've noticed, the FileCheck literal checks are quite brittle for such things - especially as we can configure which kind of separator that sys::path::append uses too. Although I guess we could mirror that into the lit subtsitutions too. (Also, for such cases, we could maybe make other lit substitutions, like %t, use the same path separator as is configured to be the default for sys::path - that would help fix a bunch of test cases that still are broken with forward slash on Windows.) But it's quite unintuitive to write the match patterns for such cases, to try to mimic exactly what combination of slashes you end up with in each case (when considering people writing tests on unix without access to testing thigs on Windows.)

I don't think considering using sys::path::Style::windows_slash explicitly is the right way to go either. I don't think we should complicate the code to do something non-obvious just for the sake of making the tools produce the path that the test framework expects.

For cases within code itself, I've considered adding a sys::path::equals that accepts either slash direction, and which would be case insensitive on windows (and darwin?). I wonder how hard it would be to add something similar to FileCheck. I don't offhand know exactly how it would work... But I think something like that would be good for easily writing testcases on unix, without spending lots of unnecessary fiddling on making them pass on Windows.

For the test where we specify one file via -l and the other via its path, the output on Windows looks like

warning: file 'L-and-l.test.tmp-input1.o' was specified multiple times.
in: C:\Users\smeenai\llvm-project\build\Release\test\tools\llvm-libtool-darwin\Output\L-and-l.test.tmp/copy/L-and-l.test.tmp-input1.o
in: C:\Users\smeenai\llvm-project\build\Release\test\tools\llvm-libtool-darwin\Output\L-and-l.test.tmp/copy\L-and-l.test.tmp-input1.o

Note that the last slash is a forward slash in one path and a backslash in the other. The forward slash is coming from our path specification in the test, and the backslash is coming from sys::path::append in libtool (to concatenate -L directories and -l names).

One way to fix this would be to introduce a new %{sep} substitution in lit (similar to the existing %{pathsep}), so we could use the appropriate slashes for the platform in our test automatically. Another would be to use sys::path::Style::windows_slash in our sys::path::append call, but that could break with \\?\ paths, which can't use forward slashes.

@mstorsjo, I believe you have a bunch of experience dealing with slash differences on Windows. Any suggestions on the best way to proceed here?

Hmm... Overall I'm not a huge fan of tests like FileCheck -DSOMETHING=%t/suffix or similar, because as we've noticed, the FileCheck literal checks are quite brittle for such things - especially as we can configure which kind of separator that sys::path::append uses too. Although I guess we could mirror that into the lit subtsitutions too. (Also, for such cases, we could maybe make other lit substitutions, like %t, use the same path separator as is configured to be the default for sys::path - that would help fix a bunch of test cases that still are broken with forward slash on Windows.) But it's quite unintuitive to write the match patterns for such cases, to try to mimic exactly what combination of slashes you end up with in each case (when considering people writing tests on unix without access to testing thigs on Windows.)

I don't think considering using sys::path::Style::windows_slash explicitly is the right way to go either. I don't think we should complicate the code to do something non-obvious just for the sake of making the tools produce the path that the test framework expects.

For cases within code itself, I've considered adding a sys::path::equals that accepts either slash direction, and which would be case insensitive on windows (and darwin?). I wonder how hard it would be to add something similar to FileCheck. I don't offhand know exactly how it would work... But I think something like that would be good for easily writing testcases on unix, without spending lots of unnecessary fiddling on making them pass on Windows.

Thanks for the suggestions!

I'm pretty short on time right now, so I just loosened the test conditionals to make it pass. I agree that a slash-agnostic path comparison would be super useful for FileCheck. I imagine you could implement it via regex (which FileCheck already has support for), but I've never looked into FileCheck's code much to say for sure.

The required FileCheck regex pattern would be {{\\/}} or something to that effect, I believe. Could we add a lit substitution, similar to %\t or %/t, which is specifically for use within FileCheck patterns, so uses the regex pattern, maybe?