This is an archive of the discontinued LLVM Phabricator instance.

[AIX] Support of Big archive (read)
Needs RevisionPublic

Authored by EGuesnet on Apr 16 2021, 7:58 AM.

Details

Reviewers
rupprecht
MaskRay
jhenderson
gbreynoo
DiggerLin
Group Reviewers
Restricted Project
Summary

Add support of AIX Big Archive. Read only.

Diff Detail

Event Timeline

EGuesnet created this revision.Apr 16 2021, 7:58 AM
EGuesnet requested review of this revision.Apr 16 2021, 7:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 16 2021, 7:58 AM
EGuesnet edited the summary of this revision. (Show Details)Apr 16 2021, 8:00 AM
hubert.reinterpretcast retitled this revision from Support of Big archive (read) to [AIX] Support of Big archive (read).Apr 16 2021, 9:59 AM
EGuesnet updated this revision to Diff 339243.Apr 21 2021, 8:07 AM

Deal correctly with memory (destructor, operator=), add test. No regression found.
It is OK for review in point of view.

EGuesnet edited the summary of this revision. (Show Details)Apr 21 2021, 8:08 AM

Added a few others that review binutils. Thanks for the patch!

Looks like the Windows pre-merge check is failing: https://reviews.llvm.org/harbormaster/unit/view/554528/. I can reproduce locally with ninja check-llvm-object. It looks like the archive is empty? Maybe try regenerating it and uploading the patch again.

$ file llvm/test/Object/Inputs/Big.a
llvm/test/Object/Inputs/Big.a: empty

While building this, there's a warning about an unhandled switch:

/home/rupprecht/src/llvm-project/llvm/lib/Object/ArchiveWriter.cpp:132:11: warning: enumeration value 'K_XCOFF' not handled in switch [-Wswitch]
  switch (Kind) {
          ^
/home/rupprecht/src/llvm-project/llvm/lib/Object/ArchiveWriter.cpp:197:11: warning: enumeration value 'K_XCOFF' not handled in switch [-Wswitch]
  switch (Kind) {
          ^

(Didn't get around to reviewing the main source changes in depth yet...)

llvm/test/Object/archive-big-read.test
1 ↗(On Diff #339243)

Instead of cd'ing to the source tree, these commands should be executed in the build tree -- i.e. drop this line, and change below to llvm-ar tv %p/Inputs/Big.a

3 ↗(On Diff #339243)

nit: it isn't technically required, but this should start with a comment (i.e. # Test reading ...) to separate it more clearly from run/check commands

EGuesnet updated this revision to Diff 339931.Apr 23 2021, 1:30 AM

Add bianry file

I've not got the time to review this properly currently.

Of the brief glance I gave this so far:

  1. There doesn't seem to be as much testing as I'd expect given the amount of new code.
  2. Using a prebuilt binary should generally be avoided. You should create the archive at test run time, by using llvm-ar or some other tool (e.g. expanding yaml2obj to support this file format).
  3. I'm not a fan of the complication of the existing archive header code by converting it into a virtual hierarchy. That being said, I haven't looked at the rest of the code properly to give any thought as to whether there's a better approach.

Finally, what's the use-case? Why do you need this functionality?

EGuesnet updated this revision to Diff 339933.Apr 23 2021, 1:36 AM
This comment was removed by EGuesnet.
EGuesnet updated this revision to Diff 339936.Apr 23 2021, 1:41 AM
EGuesnet marked 2 inline comments as done.
This comment was removed by EGuesnet.
EGuesnet updated this revision to Diff 339941.Apr 23 2021, 1:50 AM

[AIX] Support of Big archive (read)

Harbormaster completed remote builds in B100503: Diff 339936.
EGuesnet added a comment.EditedApr 23 2021, 2:18 AM

First, sorry for the noise, I do not master arc yet.

I have updated the test file according comments.

Looks like the Windows pre-merge check is failing: https://reviews.llvm.org/harbormaster/unit/view/554528/. I can reproduce locally with ninja check-llvm-object. It looks like the archive is empty? Maybe try regenerating it and uploading the patch again.

I have updated it correctly. Thanks.

While building this, there's a warning about an unhandled switch:

It occurs on ArchiveWriter.cpp. I implement a way to write Big Archive. If you think it is better, I can add some code to remove warning, but my goal is to implement read and write for Big Archive, so it will be corrected in a future commit / PR.

There doesn't seem to be as much testing as I'd expect given the amount of new code.

Read archive has list (t), extract (x) and print (p) operation. x and p are the same code behind. I test t and p. What do you want to add?

Using a prebuilt binary should generally be avoided. You should create the archive at test run time, by using llvm-ar or some other tool (e.g. expanding yaml2obj to support this file format).

I agree it should be avoided. Unfortunately, we cannot create archive at run time, as the write operation is not implemented yet (I work on it). Moreover, yaml2obj has not been ported on AIX (XCOFF object format). In future, when write operation will be implement for Big Archive, binary should be delete.

I'm not a fan of the complication of the existing archive header code by converting it into a virtual hierarchy. That being said, I haven't looked at the rest of the code properly to give any thought as to whether there's a better approach.

We must deal with two struct with different size (ArMemHd). I have not found another clean way to deal with. Moreover, Big Archive are quite a lot different than other archive. Terminator is cosmetic, Fix Size Header is different than Object Header, way to deal with object name is totally different and so on...

Finally, what's the use-case? Why do you need this functionality?

I want to port Rust on AIX. Rust needs archive and it uses llvm library to deal with it. It is not possible (or really hard) to use an external tool, like system ar. So, I must implement read and write support for Big Archive (AIX archive). This PR is the first part: read Big Archive.
As far as I know, it will also be useful to port Clang on AIX, as it will permit to port the CreateExportLists tool. This tool is used to export symbol, a required step to produce correctly binaries on AIX.

EGuesnet updated this revision to Diff 345415.EditedMay 14 2021, 5:43 AM

Updating D100651: [AIX] Support of Big archive (read)

Previous version made the assumption libLLVM was loaded once per archive read. It was OK with llvm-ar and all check I have performed, but it is wrong with rust: rust loads libLLVM, and then loads all libraries.

I have corrected this by removing a static variable and by changing way llvm checks if it is the Fixed Size Header or not.

EGuesnet updated this revision to Diff 347942.EditedMay 26 2021, 5:59 AM

[AIX] Support of Big archive (read)

One line change to decrease usage of CurrentLocation static variable.

While building this, there's a warning about an unhandled switch:

It occurs on ArchiveWriter.cpp. I implement a way to write Big Archive. If you think it is better, I can add some code to remove warning, but my goal is to implement read and write for Big Archive, so it will be corrected in a future commit / PR.

Many people build with -Werror, so it would be good to handle the case even if it's explicitly ignored, like:

case K_XCOFF:
  llvm_unreachable("Not handled yet");

Using a prebuilt binary should generally be avoided. You should create the archive at test run time, by using llvm-ar or some other tool (e.g. expanding yaml2obj to support this file format).

I agree it should be avoided. Unfortunately, we cannot create archive at run time, as the write operation is not implemented yet (I work on it). Moreover, yaml2obj has not been ported on AIX (XCOFF object format). In future, when write operation will be implement for Big Archive, binary should be delete.

Generally I agree that prebuilt binaries should be avoided for testing (mostly due to being opaque), but there's also the case that we should have at least one prebuilt binary that we use to avoid symmetric bugs (imagine a bug in a bad archive being created being missed because the archive reader was also changed to read it incorrectly).

llvm/include/llvm/Object/Archive.h
40

The clone() method should return a std::unique_ptr<AbstractArchiveMemberHeader> to avoid raw memory management

198

There's a lot of misc formatting changes in this patch. I formatted everything in e41aaea26238d0a5cb19163863819786e24f0e02, so if you rebase past that, the diff will be cleaner.

350

Does this need to be static? e.g. this could break if any tool needed to read two separate big archives

EGuesnet updated this revision to Diff 348513.May 28 2021, 6:51 AM

Updating D100651: [AIX] Support of Big archive (read)

EGuesnet marked 2 inline comments as done.May 28 2021, 6:55 AM

I have modified ArchiveWriter.cpp to avoid the Warning.

I used now an unique_ptr. Code was updated (and simplified) as a consequence. Changes are mainly in Child declaration in Archive.h and Child constructors in Archive.cpp.

EGuesnet added inline comments.May 28 2021, 7:09 AM
llvm/include/llvm/Object/Archive.h
350

As far as I know, static variable is needed. Location must be modified by Child and must be stored in Parent. As Child cannot modify Parent, it is the easiest way to deal with.
I agree it could break other tools that read two separate big archive, but I have tested this case: Rust reads lot of separate archives, and probably read them twice in some cases. And it works. The first version was broken due to static variables, but it is now correctly implemented.

Hi, @rupprecht, @jhenderson,
Any news?
It will be better if this PR (and D104367) can be integrated in next LLVM version, and the merge windows will be closed soon.
Thanks.

Hi, @rupprecht, @jhenderson,
Any news?
It will be better if this PR (and D104367) can be integrated in next LLVM version, and the merge windows will be closed soon.
Thanks.

I was leaving @rupprecht to continue reviewing this, but will try to find time to take a look further in the coming days if he doesn't.

Okay, I've spent a bit of time doing some more reviewing here.

Some high-level comments:

  1. You should get some people who are more familiar with XCOFF/AIX to help review correctness. There has been a lot of development for this in other tools (e.g. llvm-readobj).
  2. Update the review summary to be the body of the proposed commit message.
  3. The "Big Archive" format seems to be an AIX-specific thing. I wonder if we should refer to it as the "AIX Big Archive" format everywhere, to avoid confusion. See also inline comments.
  4. I think you should be using the archive member headers directly more often.

I haven't had time to fully go through this review, but this should get you started.

llvm/include/llvm/Object/Archive.h
301–309

Archives can generally hold files of any format, including non-object files like text files. It seems to me the corect name should be K_AIXBIG. What do you think?

llvm/lib/Object/Archive.cpp
720–723

This comment implies an empty archive is the same across all formats, but it looks like that's not the case for the Big Archive format. As I understand from my reading, we can't iterate over children to see if the archive is truly empty using the GNU format, for an AIX archive, so the comment needs updating.

758

I'd move this logic below the following comment, since although the process is different, it is still determining the archive format.

You can simply add a note saying something like:

// AIX Big archive format
//   Identified purely by magic bytes and uses a unique format.
760–761

Don't reflow your comments before the 80-character column width. Let clang-format do that for you.

I'm not really sure what this comment is trying to say, if I'm honest.

974

Do you actually need to change this method in the first instance? You don't have any testing for symbol printing (and I don't think you should at this point...).

979–982
llvm/lib/Object/ArchiveWriter.cpp
141 ↗(On Diff #348513)

Make sure to clang-format all your modified code.

I'd actually use report_fatal_error() rather than llvm_unreachable(), if the code could be reached by a tool receiving a file of this wrong kind, with the corresponding options. If it is genuinely unreachable of course, there is no need to change.

Same below.

llvm/test/Object/archive-big-read.test
1 ↗(On Diff #348513)

Two related nits:

  1. Although strictly not necessary, I find it slightly disorientating when seeing a lit test without comment markers for RUN and CHECK lines. I'd add them.
  2. For true comments, I encourage in newer tests to use ## to disitnguish from CHECK/RUN lines.

Also, comments should end with a full stop ('.') - applies below too.

2 ↗(On Diff #348513)

I'd rename the binary to bigFormat.a to avoid confusing it with a large regular archive.

Also, a personal preference is to use double dash for long options (i.e. --strict-whitespace).

Also, I'd add --implicit-check-not={{.}} to the FileCheck line. Without it the following output would be successfully matched against, because FileCheck matches sub-strings:

rw-r--r-- 0/0      8 Apr 21 14:12 2021 evenlendsabdjasbababfjbasjk

It also ensures there's no output before the first line or after the last checked line.

7 ↗(On Diff #348513)

I think we should test each operation in isolation. In other words, test t in a separate file to p. One is about reading the archive member list, the other about the archive member contents, so although the latter relies on the former, the inverse isn't true.

8 ↗(On Diff #348513)

I'm assuming the file contents are "evenlen". I'd actually be inclined to change the contents to be different to the member name. That way, it prevents a bug where the operation is returning the wrong thing.

I think you should also add to the FileCheck line --implicit-check-not={{.}}. This will ensure that there is no other output printed, as by default FileCheck looks for sub-strings.

jhenderson added inline comments.Jul 9 2021, 2:07 AM
llvm/include/llvm/Object/Archive.h
345–347

Do you really need this length? Why is it uint32_t?

350

This is completely unthread safe - if you have a tool parsing two archives simultaneously, this will fail horribly. Do you really need it though? See my comments further down.

From reading the spec, the archive uses offsets everywhere and it should be simple to use the values stored in the member headers.

llvm/lib/Object/Archive.cpp
455

Don't add this comment. We can see that these are child constructors by the function signature.

460

Use std::make_unique to create a unique pointer without needing to explicitly mention new. Same everywhere else you've done this.

469

Don't add a blank line at the function start.

608

It seems to me like the logic you've added to this function is a little odd. I think you may be trying too hard to match the existing logic, when it doesn't really make sense. In traditional UNIX ar format, each child is immediately followed by the next one (barring a possible even-alignment padding byte), stopping when you get to the end. My reading of the Big Archive spec is that the next child's offset is defined by the ar_nxtmem member of its header. You should just be using that and fl_lstmoff to identify if the current child is the last one or not. You can then check to see if the child goes past the buffer end, as per the existing check in this file.

EGuesnet updated this revision to Diff 358272.Jul 13 2021, 8:24 AM

[AIX] Support of Big archive (read)

Add support of AIX Big Archive. Read only.

EGuesnet edited the summary of this revision. (Show Details)Jul 13 2021, 8:25 AM
EGuesnet marked 14 inline comments as done.Jul 13 2021, 8:44 AM

You should get some people who are more familiar with XCOFF/AIX to help review correctness. There has been a lot of development for this in other tools (e.g. llvm-readobj).

The IBM team that has sent PR about llvm-readobj is aware of this PR. I will recontact them.

I think I have respond to most of your comment. The main issue is the static variable. I need to know the current location to distinguish Fix length header and Object headers, and I have not found cleaner solution.
Note that it has been checked about multiple read: Rust loads libLLVM.so once, and can read multiple archive. It works, as the static variable is initialized correctly before being used. But it is sequential read.

llvm/include/llvm/Object/Archive.h
345–347

Maximal offset (in fixed length header) is stored as 20 bytes as decimal digit. So, maximal length is approximately 10^10. It is the total length of the archive. So, it must be uint64-t, not uint32_t.

350

Main trouble is with first read, to distinguish Fixed length header and other one.
We must know the current location, and modify it during read. This cannot be done in Parent (because of const) or Header (not access when it is required). Static variable in archive is the least bad solution I have found.

llvm/lib/Object/Archive.cpp
608

I think you may be trying too hard to match the existing logic

Right.

My reading of the Big Archive spec is that the next child's offset is defined by the ar_nxtmem member of its header. You should just be using that and fl_lstmoff to identify if the current child is the last one or not. You can then check to see if the child goes past the buffer end, as per the existing check in this file.

But using offset needs to know the current location. You say in another comment this his highly not thread-safe and might be avoid.
Moreover, if I use fl_lstmoff instead of Length, I must have an access to Fixed length header. It is currently read and forbidden as length of object in archive in the only information we need.

974

I have removed code related to symbol read.

llvm/lib/Object/ArchiveWriter.cpp
141 ↗(On Diff #348513)

It can be reached by external tools (not llvm-ar), so I modifie it to use report_fatal_error().

llvm/test/Object/archive-big-read.test
1 ↗(On Diff #348513)

I think it is OK now.
I have first done the test inspiring from other one, and they do not use theses convention and flag.

3 ↗(On Diff #339243)

Done. See archive-big-extract.test.

EGuesnet updated this revision to Diff 358279.Jul 13 2021, 8:46 AM
EGuesnet marked 3 inline comments as done.

Use of clang-format.

I need to know the current location to distinguish Fix length header and Object headers, and I have not found cleaner solution.

I still don't understand why you need to know it to make this distinction. The two header types are completely different things as far as I understand the spec, used in different contexts. Could you read the Fixed Length header as the first part of reading the archive, and the object headers when reading individual children, then store them as members of the respective classes?

Note that it has been checked about multiple read: Rust loads libLLVM.so once, and can read multiple archive. It works, as the static variable is initialized correctly before being used. But it is sequential read.

Just because this works in one single case does not make this an appropriate design - in fact you haven't even tested the case where the issue is, as I bet that the thing you ran read the archives sequentially, not in parallel. What if I wrote a tool that wanted to read multiple Big-format archives in parallel, with one thread per archive? What would happen to the static variable on each individual thread?

Pseudo-code example reading algorithm that doesn't have the issues mentioned:

class Archive {
    FixedLengthHeader Header;
    std::vector<Child> Children;
    ArrayRef<uint8_t> Data;

    Archive(ArrayRef<uint8_t> Data) : Data(Data) {
      Header = reinterpret_cast<FixedLengthHeader> (Data.data());
      uint64_t Offset = Header->FirstChildOffset;
      // Check if archive is empty.
      Child C(Offset, this);
      while (C != nullptr) {
        Children.push_back(C);
        C = C.nextChild();
      }
    }
};

class Child {
    ObjectHeader Header;
    uint8_t *Payload;
    Archive &Parent;
    Child(uint64_t Offset, Archive &Archive) {
      // Check Archive data is large enough for offset + header size + payload size.
      Parent = Archive;
      Header = reinterpret_cast<ObjectHeader>(Parent->Data.data() + Offset);
      Payload = Parent->Data.data() + Offset + sizeof(Header);
    }
    Child *nextChild() {
      if (Header->nextOffset == 0)
        return nullptr;
      return Child(Header->nextOffset, Parent);
    }
};
llvm/lib/Object/Archive.cpp
608

It is currently read and forbidden as length of object in archive in the only information we need.

I don't understand this comment at all. Do you mean forgotten, not forbidden? Perhaps you could change how things are read and stored?

llvm/test/Object/archive-big-read.test
1 ↗(On Diff #348513)

Older tests use older styles. We have developed better styles in the meantime which are preferred for newer tests.

Hi @jhenderson,
I cannot follow exactly what you propose, because llvm-ar (and maybe other tools) uses directly child_iterator. As this object does not have access to Archive (=Parent), it seems difficult to transfert Child vector of Archive to child_iterator. Moreover, I want to avoid to un-const Parent in Child.

Using the same idea, but not the same exact way, I propose:

  • Move FixedLengthHeader on Archive, and use it for Big Archive only.
  • Deal with first read (Fixed Length Header) in Archive creation, so any Child after do not care about it (one of the reason of static variable in previous version).
  • To distinguish Object and Member Table, I cannot use Next Member in Object Header, because it points to Member Table in the last Object; I cannot use Offset to Member Table of Fixed Length Header, because I do not memorize the current position. So, I use a trick: an Object must have a name (adding empty-file to an archive is non valid), and Member Table Header does not have name, so name field is "`\n" (terminator).
  • The child_begin_bigarchive() function is equivalent to child_begin(), but it deals correctly with Fixed Length Header.

getSize() and Big Archive Header initialiser are simplified, differences between usual Archive and Big Archive is deal during Child and Archive initialisation (so, correctly separated).

Free List is not yet supported: if you remove object in Archive, it will become unreadable by LLVM. Free List is complex, and not fully documented. I use report_fatal_error() if Free List is present in Archive (so, not 0), or if it is impossible to read it.

Esme added a subscriber: Esme.Aug 17 2021, 5:21 AM
Esme added inline comments.
llvm/test/Object/archive-big-extract.test
1 ↗(On Diff #361218)

Can you please add the command of creating the ar file to the comment? Base on this patch, I am trying to enable tools (e.g. llvm-readobj) for ar format, but I am getting an error from BigArchiveMemberHeader::getSize() when applying this patch.

# Create the ar file under AIX
$ ar -v -q 1.a 1.o
$ llvm-ar p 1.a evenlen
llvm-ar: error: unable to load '2.a': truncated or malformed archive (characters in size field in archive header are not all decimal numbers: '\000\000\000\000\000\000\000\000\000\0005892' for archive member header at offset 128)
EGuesnet added inline comments.Aug 17 2021, 5:44 AM
llvm/test/Object/archive-big-extract.test
1 ↗(On Diff #361218)

Hi @Esme ,
I just procede as:

$ # On AIX
$ echo evenlen > evenlen
$ ar -v -q 1.a evenlen
$ # On any server
$ llvm-ar p 1.a evenlen
evenlen

As your error message is related to 2.a, I think there is something wrong to reproduce your error.

Esme added inline comments.Sep 7 2021, 11:18 PM
llvm/lib/Object/Archive.cpp
969

It's not correct to calculate the offset of the first archive member by Data.getBufferStart() + strlen(Magic) + sizeof(Archive::ArFixLenHdrType);, please use the value of ArFixLenHdr->FirstArOffset.
You can easily reproduce the bug if you test a archive file which contains a object file member like the comment I added before:

$ xlc 1.c -o 1.o
$ ar -v -q 1.a 1.o
$ llvm-ar tv 1.a
llvm-ar: error: unable to load '1.a': truncated or malformed archive (characters in size field in archive header are not all decimal numbers: '\000\000\000\000\000\000\000\000\000\0005892' for archive member header at offset 128)

The correct offset for this case should be 138 instead of 128.

EGuesnet added inline comments.Sep 14 2021, 12:51 AM
llvm/lib/Object/Archive.cpp
969

First, LLVM is no more a priority of our team, so I cannot spend time to this PR.
Second, I am not able to reproduce. Please give me content of 1.c, and content of 1.a.

// small.c
int add_two (int a) {
        return a + 2;
}

$ xlc -v
C for AIX Compiler, Version 5

OK, really old, but we don't care for archive.

$ xlc small.c  -o small.o -c
$ ar -v -q small.a small.o

$ xxd small.a

00000000: 3c62 6967 6166 3e0a 3130 3730 2020 2020  <bigaf>.1070
00000010: 2020 2020 2020 2020 2020 2020 3132 3332              1232
00000020: 2020 2020 2020 2020 2020 2020 2020 2020
00000030: 3020 2020 2020 2020 2020 2020 2020 2020  0
00000040: 2020 2020 3132 3820 2020 2020 2020 2020      128
00000050: 2020 2020 2020 2020 3132 3820 2020 2020          128
00000060: 2020 2020 2020 2020 2020 2020 3020 2020              0
00000070: 2020 2020 2020 2020 2020 2020 2020 2020
// End of Fixed-Length Header: size is 128
00000080: 3832 3020 2020 2020 2020 2020 2020 2020  820
00000090: 2020 2020 3130 3730 2020 2020 2020 2020      1070
000000a0: 2020 2020 2020 2020 3020 2020 2020 2020          0
000000b0: 2020 2020 2020 2020 2020 2020 3136 3331              1631
000000c0: 3630 3435 3039 2020 3020 2020 2020 2020  604509  0
000000d0: 2020 2020 3020 2020 2020 2020 2020 2020      0
000000e0: 3634 3420 2020 2020 2020 2020 3720 2020  644         7
[...]

$ llvm-ar --version
LLVM version 13.0.0

$ llvm-ar t small.a
small.o # OK

Third, according documentation https://www.ibm.com/docs/en/aix/7.2?topic=formats-ar-file-format-big , Fixed-Length Header must have a size of 128. 138 is not valide.

Esme added inline comments.Sep 14 2021, 2:43 AM
llvm/lib/Object/Archive.cpp
969

Thanks for your reply!

  1. If you are no longer following up on this patch, I'd be happy to continue your work. How do you think?
  1. In my previous comment, a binary member is added to the archive, while in your case, there is an object member (compiled with -c option) in archive.

You can also reproduce the issue if you add a dynamic lib to an archive.

$ cat 1.c
int main() {
return 1;
}
$ xlc 1.c -qmkshrobj -o libt.so
$ ar -v -q 1.a libt.so
$ xxd 1.a 
00000000: 3c62 6967 6166 3e0a 3134 3532 2020 2020  <bigaf>.1452    
00000010: 2020 2020 2020 2020 2020 2020 3136 3134              1614
00000020: 2020 2020 2020 2020 2020 2020 2020 2020                  
00000030: 3020 2020 2020 2020 2020 2020 2020 2020  0               
00000040: 2020 2020 3133 3420 2020 2020 2020 2020      134         
00000050: 2020 2020 2020 2020 3133 3420 2020 2020          134     
00000060: 2020 2020 2020 2020 2020 2020 3020 2020              0   
00000070: 2020 2020 2020 2020 2020 2020 2020 2020                  
00000080: 0000 0000 0000 3131 3935 2020 2020 2020  ......1195    
// There are some padding between the Fixed-Length Header and the First member.  
00000090: 2020 2020 2020 2020 2020 3134 3532 2020            1452  
000000a0: 2020 2020 2020 2020 2020 2020 2020 3020                0 
000000b0: 2020 2020 2020 2020 2020 2020 2020 2020
  1. It's correct that Fixed-Length Header must have a size of 128. 138 is not valide. As above, there may be some padding between the Fixed-Length Header and the first member. So I think the value of ArFixLenHdr->FirstArOffset is the exact offset to the first member.
EGuesnet added inline comments.Sep 14 2021, 5:43 AM
llvm/lib/Object/Archive.cpp
969

Using ArFixLenHdr->FirstArOffset does not work.

truncated or malformed archive (characters in name length field in archive header are not all decimal numbers: '' for archive member header at offset 68)

I do not know where this "68" is from.

In my opinion, this PR is complex enough. It might be accepted with as few changes as possible. So, without freelist, undocumented features... In a second time, you can create a new PR, to extend support of Big Archive, with new tests.

Esme added inline comments.Sep 14 2021, 7:31 PM
llvm/lib/Object/Archive.cpp
969

OK, thanks!
Do you get the FirstArOffset value via StringRef(ArFixLenHdr->FirstArOffset, sizeof(ArFixLenHdr->FirstArOffset)).rtrim(' ').getAsInteger(10, Size) ?

EGuesnet added inline comments.Sep 15 2021, 3:28 AM
llvm/lib/Object/Archive.cpp
969

My mistake, you are right. I take FirstArOffset directly...
But in order to treat the padded case, you must add test, with binary file. LLVM community want to avoid adding binary to test. So, I think again a separate PR after this one is preferable.

DiggerLin added inline comments.
llvm/lib/Object/Archive.cpp
106

There are common source code in the BigArchiveMemberHeader::BigArchiveMemberHeader and ArchiveMemberHeader::ArchiveMemberHeader , we can put the common code in the constructor of AbstractArchiveMemberHeader

DiggerLin added inline comments.Sep 28 2021, 12:16 PM
llvm/include/llvm/Object/Archive.h
315

I do not think we need a special child_begin_bigarchive() API for aix big ar here,
we can use the same API child_begin() , there is a member
Format in the Archive, we use have different implement of child_begin() base on the value of "Format" , if the
Format is K_AIXBIG(), we have different implement of child_begin().

gentle ping, we need the patch, can you help to speed it up ? @EGuesnet

jsji added a reviewer: Restricted Project.Sep 29 2021, 7:48 PM

Hi @DiggerLin,

I do not have time to improve this PR before begin of November. We have other priorities right now.
This PR has nearly 6 months, so, it can wait one more.

DiggerLin added inline comments.Sep 30 2021, 11:52 AM
llvm/include/llvm/Object/Archive.h
98–151

both the ArchiveMemberHeader and the BigArchiveMemberHeader has the "Archive const *Parent;" , can the member put in the AbstractArchiveMemberHeader ?

llvm/lib/Object/Archive.cpp
64

if the RawHeaderPtr == nullptr is true , the value of ArMemHdr will be random here.

110

same problem as above.

177

the variable name "end" can not express the mean, maybe be "NameSize" or "NameLen" ?

319–368

there are several place use the almost the same code from line 308 to 320 . can we change these lines into a helper function?

524

change "Name.size() + Name.size() % 2" to ((Name.size()+1) >>1 ) << 1 ?

532

the Name.startswith("#1/")) is not for the Archive::K_AIXBIG. what happen if the AIXBIG archive file which has member name begin with "#/" here ?

DiggerLin requested changes to this revision.Sep 30 2021, 2:08 PM
DiggerLin added inline comments.
llvm/lib/Object/Archive.cpp
281

it looks the parameter Size not be used here?

This revision now requires changes to proceed.Sep 30 2021, 2:08 PM
This revision now requires review to proceed.Sep 30 2021, 2:15 PM
DiggerLin added inline comments.Oct 1 2021, 7:03 AM
llvm/include/llvm/Object/Archive.h
315

also using a separate interface child_begin_bigarchive , you can not use the API

iterator_range<child_iterator> children(Error &Err,
                                        bool SkipInternal = true) const {
  return make_range(child_begin(Err, SkipInternal), child_end());

the function file_magic llvm::identify_magic(StringRef Magic) should be modified to add big ar identification. otherwise object::createBinary will be error for big ar.

DiggerLin added inline comments.Oct 5 2021, 6:49 AM
llvm/include/llvm/Object/Archive.h
315

or you can call child_begin_bigarchive() in child_begin(Error &Err, bool SkipInternal = true) when Format is K_AIXBIG()

DiggerLin added inline comments.Oct 6 2021, 3:05 PM
llvm/lib/Object/Archive.cpp
319–368

and I just wonder why you need the code

std::string Buf;
    raw_string_ostream OS(Buf);
    OS.write_escaped(
        StringRef(ArMemHdr->Size, sizeof(ArMemHdr->Size)).rtrim(" "));
    OS.flush();

to convert to std::string.

for the StringRef , there is a function str() to convert to std::string, why not use str() of StringRef ?

can you address the comment ,update the patch? we will approve it and We will create a following patch for the "Support of Big archive" @EGuesnet

jsji added a subscriber: jsji.Oct 12 2021, 6:48 AM

In my opinion, this PR is complex enough. It might be accepted with as few changes as possible. So, without freelist, undocumented features... In a second time, you can create a new PR, to extend support of Big Archive, with new tests.

As @EGuesnet mentioned above, I think he won't have time to address the comments in time.
However, we do depend on this patch for AIX in near future,
so I would suggest that we *accept* this patch *as it is* for now, and commit it as @EGuesnet suggested.

And @DiggerLin will post follow up patches immediately.

Is that OK for reviewers? @jhenderson @rupprecht @Esme ?

In my opinion, this PR is complex enough. It might be accepted with as few changes as possible. So, without freelist, undocumented features... In a second time, you can create a new PR, to extend support of Big Archive, with new tests.

As @EGuesnet mentioned above, I think he won't have time to address the comments in time.
However, we do depend on this patch for AIX in near future,
so I would suggest that we *accept* this patch *as it is* for now, and commit it as @EGuesnet suggested.

And @DiggerLin will post follow up patches immediately.

Is that OK for reviewers? @jhenderson @rupprecht @Esme ?

This patch is not in a suitable state ready for commit. I've highlighted a high number of issues, and I think there are some fundamental issues with the approach, which need addressing before it is acceptable to land this in LLVM. I haven't finished reviewing the whole patch yet either. I'm sorry if you need the patch in the near future. I can only suggest that you try implementing something yourself, perhaps by adopting and improving this patch, or starting an alternative version (inspired by this patch, and with credit given to @EGuesnet for their initial work, in the commit message).

llvm/include/llvm/Object/Archive.h
39–40

This comment is superfluous. clone always means this.

48–52

getOffset isn't named getRawOffset, so doesn't look like it belong in this block.

getRawName on the other hand looks like it does belong here.

Please make sure all comments follow the existing style. At the moment, comments are doxygen style (see e.g. getName). LLVM style also uses trailing full stops on comments - please add them where they are missing. That being said, don't bother with comments that provide no useful benefit beyond the method names. For example, you do not need this comment or the "Non-raw getters" comment: what benefit do they provide?

103

Please order const according to standard LLVM style, i.e. const T *.

If you're changing a function anyway, it makes sense to fix the style.

105

Please separate unrelated method declarations, or methods with non-single-line definitions with a blank line.

177

Do we actually need all these new constructors and assignment operators? Are they used?

179

Under what circumstances can C.Header be a nullptr?

182

Nit: add new line before this method.

183

Parent is a raw pointer, there's no need to std::move it.

346

Nit: please use // comments in this struct, as per the rest of the code.

As I understand it, this is a specific BigArchive class. Please rename it to be clear, e.g. BigArchiveFixLenHdr.

354

I think the presence of this member shows that the design is not right.

BigArchive is a kind of Archive. The members specific to BigArchive types (as opposed to e.g. traditional BSD or GNU archives) should not be present in a type that should be generic.

I think rather than, or possibly as well as, having this member, you really want to have concrete Archive subclasses that implement the relevant logic, and make Archive an abstract class.

llvm/lib/Object/Archive.cpp
46

I don't like this define. It is not guaranteed that all future archive magic is 8 characters long. I don't think it would be necessary if we made the Archive class a class hierarchy as suggested above.

I'm also slightly surprised that BigMagic doesn't start with !, but I assume that is correct and intentional.

68

Why has this code moved?

131

Terminator characters are cosmetic only, really, in regular archives too, but we still check them.

173–174

This comment has several typos in, and I'm not really sure what it is actually trying to say.

314–316

I'd prefer that this calculation be factored out into at least a variable, as I don't follow what it represents.

320–322

This pattern is a duplicate of most of the earlier expression. Put it in a variable rather than duplicating code.

402

getRawName appears not to trim the trailing spaces, but getRawUID does. This seems like a logical inconsistency that will likely lead to bugs.

473

These if/else checks clearly indicate that Child should really be BigArchiveChild and something else, sharing a common Child abstract base class.

620–625

Revert this change.

659–660

This looks like an unrelated formatting change.

jhenderson requested changes to this revision.Oct 12 2021, 7:55 AM
This revision now requires changes to proceed.Oct 12 2021, 7:55 AM
jsji added a comment.Oct 12 2021, 7:59 AM

I can only suggest that you try implementing something yourself, perhaps by adopting and improving this patch, or starting an alternative version (inspired by this patch, and with credit given to @EGuesnet for their initial work, in the commit message).

Thanks @jhenderson ! That would work too . @DiggerLin Let us do it -- you can create another patch based on this and give credit to @EGuesnet if he can NOT address the comment in time.

I can only suggest that you try implementing something yourself, perhaps by adopting and improving this patch, or starting an alternative version (inspired by this patch, and with credit given to @EGuesnet for their initial work, in the commit message).

Thanks @jhenderson ! That would work too . @DiggerLin Let us do it -- you can create another patch based on this and give credit to @EGuesnet if he can NOT address the comment in time.

yes, it is no problem for me. thanks .

Hi @jhenderson, @jsji , @DiggerLin,
As I said before, I do not have time to improve this PR before one or two months at least, as our priorities have shifted.
My proposal to @Esme about accepting the patch with minor changes was OK but now there are too much changes requested by reviewers.
So, I accept that you start an improvement / alternative version as long as you credit.

Please tag me when you will open your PR.
For your information, I have opened another PR about writing Big Archive here: https://reviews.llvm.org/D104367 .

jsji added a comment.Oct 13 2021, 6:32 AM

Hi @jhenderson, @jsji , @DiggerLin,
As I said before, I do not have time to improve this PR before one or two months at least, as our priorities have shifted.
My proposal to @Esme about accepting the patch with minor changes was OK but now there are too much changes requested by reviewers.
So, I accept that you start an improvement / alternative version as long as you credit.

Please tag me when you will open your PR.
For your information, I have opened another PR about writing Big Archive here: https://reviews.llvm.org/D104367 .

Great! Thank you for your kind and contribution to this! @EGuesnet

@DiggerLin Please credit and tag @EGuesnet in your new patch. Thanks.