This is an archive of the discontinued LLVM Phabricator instance.

Produce cpio files for --reproduce
ClosedPublic

Authored by rafael on May 3 2016, 8:27 AM.

Details

Reviewers
ruiu
Summary

We want --reproduce to

  • not rewrite scripts and thin archives
  • work with absolute paths

Given that, it pretty much has to create a full directory tree. On windows that is problematic because of the very short maximum path limit. On most cases users can still work around it with "--repro c:\r", but that is annoying and not viable for automated testing.

We then need to produce some form of archive with the files. The first option that comes to mind is .a files since we already have code for writing them. There are a few problems with them

  • The format has a dedicated string table, so we cannot start writing it until all members are known.
  • Regular implementations don't support creating directories. We could make llvm-ar support that, but that is probably not a good idea.

The next natural option would be tar. The problem is that to support long path names (which is how this started) it needs a "pax extended header" making this an annoying format to write.

The next option I looked at seems a natural fit: cpio files.

They are available on pretty much every unix, support directories and long path names and are really easy to write. The only slightly annoying part is a terminator, but at least gnu cpio only prints a warning if it is missing, which is handy for crashes. This patch still makes an effort to always create it.

Diff Detail

Event Timeline

rafael updated this revision to Diff 56002.May 3 2016, 8:27 AM
rafael retitled this revision from to Produce cpio files for --reproduce.
rafael updated this object.
rafael added a reviewer: ruiu.
rafael added a subscriber: llvm-commits.
rafael updated this revision to Diff 56007.May 3 2016, 8:30 AM

Fix test.

Ah, another interesting format would be uncompressed .zip.

It is somewhat similar to cpio in structure, but instead of just a terminator it has a more complicated summary in the end, which makes the file less useful if we crash before finishing it.

In any case, converting to zip would reuse the general structure in this file, so we should probably do this first.

ruiu edited edge metadata.May 3 2016, 8:50 AM

Very interesting. The choice I thought after "ar" was tar, but cpio seems to be indeed simpler than tar. I knew that the file format exists but I didn't know the details about it.

I'd probably define Cpio class, but this is fine as well.

ELF/DriverUtils.cpp
130

If you are not going to fix it, it is not a fixme. Please change the comment to say that this should be unique, but since no one cares, we don't bother to set a unique number.

149–154

When storing pathnames to an archive, you don't want to prepend Config->Reproduce string, no? For example, if you are storing /home/ruiu/foo.obj, then you store the file as that path, instead of /home/ruiu/repro/home/ruiu/foo.obj.

206

Ditto. You want to store this as just "response.txt" instead of a full path.

ELF/Error.cpp
32

You want to do this for fatal() as well.

test/ELF/reproduce-linkerscript.s
9

I'm not sure if cpio is available everywhere. If not, you need to define "cpio" and "REQUIRES: cpio".

rafael updated this revision to Diff 56009.May 3 2016, 9:09 AM
rafael edited edge metadata.

address review comments.

Comment at: ELF/DriverUtils.cpp:130
@@ +129,3 @@
+ OS << "000000"; c_dev
+ OS << "000000";
c_ino // FIXME: should be unique

+ OS << "100664"; // c_mode: C_ISREG | rw-rw-r--

If you are not going to fix it, it is not a fixme. Please change the comment to say that this should be unique, but since no one cares, we don't bother to set a unique number.

Done

Comment at: ELF/DriverUtils.cpp:145-146
@@ -122,4 +144,4 @@
+void elf::maybeCopyInputFile(StringRef Src, StringRef Data) {

std::string Dest = getDestPath(Src);
  • StringRef Dir = path::parent_path(Dest);
  • if (std::error_code EC = fs::create_directories(Dir)) {
  • error(EC, Dir + ": can't create directory");

+ maybePrintCpioMember(Dest, Data);

+}

When storing pathnames to an archive, you don't want to prepend Config->Reproduce string, no? For example, if you are storing /home/ruiu/foo.obj, then you store the file as that path, instead of /home/ruiu/repro/home/ruiu/foo.obj.

It should be repro/home/ruiu/foo.obj. That way extracting the archive
only creates one directory and you can fakeroot into it.

Comment at: ELF/DriverUtils.cpp:202
@@ +201,3 @@
+ SmallString<128> Dest;
+ path::append(Dest, Config->Reproduce, "response.txt");

+ maybePrintCpioMember(Dest, Data);

Ditto. You want to store this as just "response.txt" instead of a full path.

This is just repro/response.txt.

Comment at: ELF/Error.cpp:32
@@ -31,2 +31,3 @@

HasError = true;

+ maybeCloseReproArchive();

}

You want to do this for fatal() as well.

Done

Comment at: test/ELF/reproduce-linkerscript.s:9
@@ -8,2 +8,3 @@

  1. RUN: ld.lld build/foo.script -o bar --reproduce repro

+# RUN: cpio -id < repro.cpio

  1. RUN: diff build/foo.script repro/%:t.dir/build/foo.script ----------------

I'm not sure if cpio is available everywhere. If not, you need to define "cpio" and "REQUIRES: cpio".

It is included in the gnuwin32 package that we require for testing on
windows, so I think we are good.

I will upload a new patch in one sec.

Cheers,
Rafael


llvm-commits mailing list
llvm-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits

ruiu accepted this revision.May 3 2016, 9:17 AM
ruiu edited edge metadata.

LGTM with a nit.

Thank you for doing this. This should be much more reliable than moving files on the actual file system.

ELF/Driver.cpp
267

You can move maybeCloseReproArchive here because all input files (referred directly or indirectly) should be read until this point.

494

Remove.

This revision is now accepted and ready to land.May 3 2016, 9:17 AM
ruiu added a comment.May 3 2016, 9:37 AM

I noticed one more thing. The test will probably fail on the bot because of it.

test/ELF/reproduce.s
8

It may fail to extract files under the temporary directory because it could exceeds 255 characters.


You can move maybeCloseReproArchive here because all input files (referred directly or indirectly) should be read until this point.

That is too early. We haven't even added _start to the undefined list yet.

Cheers,
Rafael

emaste added a subscriber: emaste.May 3 2016, 10:49 AM
emaste added inline comments.
test/ELF/reproduce-linkerscript.s
9

I believe cpio was specified by POSIX at one point but was dropped later on due to limitations in the format and for most normal archiving cases tar is a better choice.

Anyhow, I suspect cpio will be widely, but not universally available and likely does need a REQUIRES: cpio

FWIW, with BSD cpio (libarchive) tests pass with the changes here.

The cpio command was dropped, but the format is still documented. The pax
command has to support it.

The cpio command was dropped, but the format is still documented. The pax command has to support it.

Ah, yes - although we should perhaps use pax in the tests instead of cpio then?

Eugene.Zelenko added a subscriber: Eugene.Zelenko.

Committed in r268404.