Page MenuHomePhabricator

[AIX] Support of Big archive (read)
Needs ReviewPublic

Authored by DiggerLin on Oct 15 2021, 7:13 AM.


Group Reviewers
Restricted Project

The patch is based on the EGuesnet's implement of the "Support of Big archive (read)
the first commit of the patch is come from

the rest of commits of the patch

1  Addressed the comments on the
2  according to

using the "fl_fstmoff" for the first object file number, using "char ar_nxtmem[20]" to get next object file , using the "char fl_lstmoff[20]" for the last of the object file will fix the following problems:

   2.1 can not correct reading a archive files which has padding data between too object file
   2.2 can not correct reading a archive files from which some object file has be deleted
3 introduce a new derived class BigArchive for big ar file.

Diff Detail

Unit TestsFailed

40 msx64 debian > LLVM.Object::archive-big-extract.test
Script: -- : 'RUN: at line 2'; rm -rf /var/lib/buildkite-agent/builds/llvm-project/build/test/Object/Output/archive-big-extract.test.tmp && mkdir -p /var/lib/buildkite-agent/builds/llvm-project/build/test/Object/Output/archive-big-extract.test.tmp/extracted/
30 msx64 debian > LLVM.Object::archive-big-print.test
Script: -- : 'RUN: at line 2'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/llvm-ar p /var/lib/buildkite-agent/builds/llvm-project/llvm/test/Object/Inputs/aix-big-archive.a evenlen | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/llvm/test/Object/archive-big-print.test --check-prefix=AIXBIG --implicit-check-not={{.}}
20 msx64 debian > LLVM.Object::archive-big-read.test
Script: -- : 'RUN: at line 2'; env TZ=GMT /var/lib/buildkite-agent/builds/llvm-project/build/bin/llvm-ar tv /var/lib/buildkite-agent/builds/llvm-project/llvm/test/Object/Inputs/aix-big-archive.a | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/llvm/test/Object/archive-big-read.test --strict-whitespace --check-prefix=AIXBIG --implicit-check-not={{.}}

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

added llvm-objdump -a test case for big archive.

For the 2 binary files being added, I think we should append xcoff- in the names.


I think James is suggesting that the 'extract' in the test name is misplaced since we are not in fact extracting anything. Instead how about naming the test something along the lines of big-archive-print.test.

I think we should add "x" option in the llvm/test/tools/llvm-ar/extract.test after implement write big archive.

It is nice to reuse the existing tests for the new format, but as you pointed out we can't until we implement writing big archive files first. If you add an extract test in this directory (or even as a second runstep in this file), with the binary file already being added in this test then we get coverage now, and don't have to wait until we implement big archive writing.

DiggerLin marked an inline comment as done.Oct 26 2021, 2:13 PM
DiggerLin added inline comments.

if I added the -x to extract test the archive file Input/libtest.a (the archive file also is used in ) in the test case. I will use "diff command" to diff the extracting object file with original object file. I think I have to add two original object files into directory llvm/test/Object/Input for compare . Adding a extra object file is not a good idea unless we need it. if you strong suggestion I add the -x option with Input/libtest.a. I will add it.

DiggerLin marked an inline comment as done.Oct 26 2021, 2:39 PM
DiggerLin added inline comments.

I think I can use bigFormat.a for option -x , I will add -x option test. thanks

DiggerLin added a comment.EditedOct 27 2021, 7:42 AM

For the 2 binary files being added, I think we should append xcoff- in the names.

I changed the name "llvm/test/tools/llvm-objdump/XCOFF/Inputs/libtest.a" --> big-archive-libtest.a (for the file already in the XCOFF/Input directory, I do not we need to add xcoff- as prefix

I change llvm/test/Object/Inputs/bigFormat.a --> llvm/test/Object/Inputs/aix-big-archive.a

DiggerLin updated this revision to Diff 382677.Oct 27 2021, 7:55 AM

address comment and added a new extract test case.

gentle ping, @jhenderson , if you have time, can you help to continue to review the patch ?

gentle ping, @jhenderson , if you have time, can you help to continue to review the patch ?

Sorry for the delay. I'll take a look early next week. Out of time for more reviews today.

I've made a start on continuing this review. Will try to get through the rest of the file tomorrow.


Whilst I agree that the isEmpty function is not written well, I think there's a wider problem within the archive code where it assumes that the magic will always be the same size. It currently is, for all supported archive types, but that doesn't mean it'll always be the case. Better would be to have the magic stored within the archive, or at least the magic length (or have a function that switches based on archive kind and returns the [length of the] magic). This would be an unrelated refactor.


Conceptually, the kind of archive is important, not the class it is represented in.


It seems to me like this should be a pure virtual function, rather than returning 0 here - another archive kind in the future should explicitly say what the size of its member headers are (even if it is zero). By providing a default implementation, there's a real risk some future implementation won't override this, and will then have problems.


Please move data to one place, rather than sandwiched between class methods.

Also, does this need to be public? It wasn't before...


Make this protected, and move it to near the top of the class, by the destructor, where constructors usually live.


Perhaps add a comment for this class indicating what this archive format is used for.


Add a blank line before this function.


Separate unrelated functions with blank lines.


If you don't want to do that change in this patch (which is fair enough), could you please do it in a separate precursor patch to this, please?

I'd probably not call the concrete class GNUArchive, since it will, at least for now, act as the concrete class used for a number of closely-related archive formats, not all of which are GNU-related. UNIXArchive might be more accurate. In the future, we might end up splitting that further into additional types, e.g. COFFArchive, GNUArchive, BSDArchive etc).


I think the earlier name is fine for this. I'd call it "FixLenHdrType" or ideally just "FixLenHdr" in fact ("Type" on the end of a class name is ugly, and doesn't really add anything. No need to prefix "BigAr", since you're already nested in BigArchive at this point.


Let's make these consistent with the getters: FirstChildOffset and LastChildOffset.

95 ↗(On Diff #381028)

I think it would be good to have at least one test for each of the binutils with a BigArchive input. It's probably sufficient to add that in a separate patch - I'd prefer we be able to generate these input files from a combination of yaml2obj and llvm-ar, so we will need the writing patch to do so.


Does the code work without this? We tend not to add includes unless they're actually needed to make the code work.


I'd be surprised if this header is actually needed, especially the C-style one.


If I'm reading this rightly, this entire block could be moved into common code with the regular archive vareity. If it's not possible to put it into the base class constructor, put it in a helper function or method instead.


Don't have trailing blank lines at ends of functions.


Also consider adding comments to name the nullptr and 10 values, e.g. strtol(ArMemHdr->NameLen, /*endptr=*/nullptr, /*base=*/10);

The regular archive kind has various safety checks to make sure the number read makes sense. For example, it checks to make sure there's actually a number in the field. We also need to show that the name's start and end are within the archive buffer, otherwise we'll get crashes/reading past the end of the file etc.


At the moment, this should be a cantFail, since no error can actually get here, but see also my above comment re. validation of the raw name.

jhenderson added inline comments.Fri, Nov 12, 1:31 AM

Here and in similar functions, don't repeat this call: do it once and store in a variable (like you already do for the RawSize in getSize


Put variable declarations close to their first usage.

I'm a little confused what the name length has to do wtih the size? Add a comment to explain this near the end of this function, I suggest.


The logic for calculating and checking RawSize is identical to the regular archive logic. Pull it into a common function.


Seems like this should be a getNameLen method?


Another very similar function to earlier functions. Consider sharing the logic.


Yes, I can see that this does that, since it's in the function name. Don't bother with comments that essentially describe the same thing as the function name.


As above - pull into a common function. This looks basically identical to the getRawAccessMode function, so you should be able to do something to avoid duplicating the logic (maybe use a template for the LastModified's/AccessMode's type if needed.


The logic in this and related functions is very similar. I wonder if you could pull it into a function or callable class, parameterised on things like the field to check and the size and name of the field?


Why RawStringRef? That doesn't sound like it's got anything to do with an offset...


I'm confused why you've bothered to change the logic here, but didn't with getUID. Are you sure you want to change it here? If so, why not in getUID too?

Also, have you actually changed the behaviour with this? It looks suspicious to me.


Should this be "offset field"?


These two functions are identical. Could they be pushed into the base class? You could pass in the pointer to get the offset from.


Same as above. Don't duplicate logic - use functions! In this case, a function that takes an AccessMode field (whatever the type of that field is) should be plenty sufficient.


Same comments as above.


As above.


This block is actually another good reason why you should change the parent class to be two separate concrete classes - each class could have a function which returns the appropriate member header type using the Factory pattern.


See above re. moving this logic into the parent class, but also I wonder if you could share this logic with the other constructor somehow (it may not be that simple).


I'm struggling a little to understand the grammar with this sentence. I think you're trying to say something like "The actual start of the file is after the name and any necessary even-alignment padding." or something to that effect.


Don't have blank lines at start of functions.


I'd flip this logic: put the Big Archive special case first, and then the common case, so that the conditional doesn't need to be a negative.

That being said, this may be pointing to the fact that Child needs to be a class hierarchy too, similar to my other points.


I'm not sure you need this comment. I think the code is pretty self explanatory.


You can use cast when you know the types will match. This will result in an assertion if they don't (which is fine), rather than a null pointer.


Flip this so you can avoid the else.


Get rid of the blank lines between these local variable declarations.


Make Ret a unique_ptr<Archive> and use std::make_unique here and below. Otherwise, you have a memory leak if Err is reported.


Stare at this code a minute and see if you can spot the bug... (hint, what is the value of Format before and after if this is a BigArchive?)


Change this name. Do you understand what the variable is supposed to represent, because this name makes me think you don't...?


There are still unnecessary braces in this if.


Get rid of the extra spaces, although actually I think you should do the cd on the same line as the directory creation.


The comment markers aren't necessary yet, but in a future version, where you create the inputs on the fly via yaml2obj etc, you will need them.

No need for --check-prefix when there's only one FileCheck run in a file: use just the default instead.


In the future, the input will hopefully be generated on the fly rather than using a canned binary.

Also, add comment markers and don't use check-prefix, as above.

1 ↗(On Diff #382677)

See my earlier comment re. let's not add this test just yet (we don't want to add binaries to the git repo if we can avoid it, and I think testing it should be in all tools that can take a Big Archive, once we have write support).


I think you can get rid of this blank line.

DiggerLin marked 32 inline comments as done.Thu, Nov 18, 11:54 AM
DiggerLin added inline comments.
95 ↗(On Diff #381028)

I will remove the change from the patch, the change do not need from the patch, it will need on when I will implement the change in the create export list.


it can not put into the base class constructor , for the definition of ArMemHdrType is different in the BigArchiveMemberHeader and ArchiveMemberHeader.


after address the comment, the getRawName , maybe return a Error.


I have think over to use a common function before. but
for the definition of ArMemHdr is different in ArchiveMemberHeader and BigArchiveMemberHeader , it is difficult to put into a common.

a lot of duplicated code are caused by the same reason.
If change the name of ArMemHdr in the BigArchiveMemberHeader to "BigArMemHdr " , it maybe be more easy to understand the reason.


it can not pushed into the base class and pass in the pointer to get offset from.

When you call from the Child , for example
Header is AbstractArchiveMemberHeader, it do not have ArMemHdr


the ArMemHdr are defined different in the ArchiveMemberHeader and BigArchiveMemberHeader to use a function to take the raw string of AccessMode.

DiggerLin updated this revision to Diff 388295.Thu, Nov 18, 1:03 PM
DiggerLin marked 7 inline comments as done.

address comment

DiggerLin edited the summary of this revision. (Show Details)Thu, Nov 18, 1:04 PM
DiggerLin marked an inline comment as done.Thu, Nov 18, 1:09 PM
DiggerLin added inline comments.

I will create another patch for it.

DiggerLin marked an inline comment as done.Fri, Nov 19, 12:23 PM

I've stopped reviewing partway through this. There are a number of my previous comments that you've marked as done but haven't been addressed. Can you confirm that you've updated this to the latest diff you'd like me to review, please, before I proceed further?


This has been marked as done, but isn't addressed in the latest diff. Please don't mark things as done until they've been addressed in an uploaded diff (if you select the Mark as Done checkbox, in the UI, and then upload the diff, the checked boxes will be submitted automatically when you upload the diff).


Also, does this need to be public? It wasn't before...

This bit hasn't been addressed.


If I'm not mistaken, "aix" is usually written as "AIX", so we should do the same here (I may easily be wrong though).


I'd group these accessor functions as getRaw... in one block, and then get... (without raw) in another block, with a blank line separating teh two. I'd also suggest ordering within the blocks to match each other, as close as practical.


Marked as Done but not addressed.


Please address all clang-format problems in your modified/new code.

  1. Lower-camel-case names for functions.
  2. "create" is a more common term than "generate" for these sort of functions.

Is this safe and correct given that this has been done at the start of the calling code too?


Make this a StringRef (or event a Twine I believe would work), rather than std::string.


I believe if you flip these around, you can do the suggested inline edit. Also note I added the braces for the old else (now the new if part), as I believe the consensus is that if you use braces for an if, you should for all its corresponding else parts too (and vice versa).


I don't believe you need this if anymore, right?


Isn't that what the getSizeOf function is for?


I don't believe you need this if anymore?


Marked as done, but not addressed.