This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Object/
-
llvm/
-
Object/
7/23
Archive.h
-
lib/Object/
-
Object/
13/37
Archive.cpp

Differential D100651

[AIX] Support of Big archive (read)
Needs RevisionPublic

Authored by EGuesnet on Apr 16 2021, 7:58 AM.

Download Raw Diff

Details

Reviewers

rupprecht
MaskRay
jhenderson
gbreynoo
DiggerLin

Group Reviewers

Restricted Project

Summary

Add support of AIX Big Archive. Read only.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

EGuesnet created this revision.Apr 16 2021, 7:58 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald TranscriptApr 16 2021, 7:58 AM

EGuesnet requested review of this revision.Apr 16 2021, 7:58 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 16 2021, 7:58 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

EGuesnet edited the summary of this revision. (Show Details)Apr 16 2021, 8:00 AM

Harbormaster completed remote builds in B99182: Diff 338117.Apr 16 2021, 9:26 AM

hubert.reinterpretcast retitled this revision from Support of Big archive (read) to [AIX] Support of Big archive (read).Apr 16 2021, 9:59 AM

hubert.reinterpretcast added subscribers: daltenty, hubert.reinterpretcast.

Deal correctly with memory (destructor, operator=), add test. No regression found.
It is OK for review in point of view.

EGuesnet edited the summary of this revision. (Show Details)Apr 21 2021, 8:08 AM

Harbormaster completed remote builds in B99990: Diff 339243.Apr 21 2021, 9:02 AM

dblaikie added a subscriber: dblaikie.Apr 21 2021, 5:26 PM

rupprecht added reviewers: MaskRay, jhenderson.Apr 22 2021, 1:30 PM

Added a few others that review binutils. Thanks for the patch!

Looks like the Windows pre-merge check is failing: https://reviews.llvm.org/harbormaster/unit/view/554528/. I can reproduce locally with ninja check-llvm-object. It looks like the archive is empty? Maybe try regenerating it and uploading the patch again.

$ file llvm/test/Object/Inputs/Big.a
llvm/test/Object/Inputs/Big.a: empty

While building this, there's a warning about an unhandled switch:

/home/rupprecht/src/llvm-project/llvm/lib/Object/ArchiveWriter.cpp:132:11: warning: enumeration value 'K_XCOFF' not handled in switch [-Wswitch]
  switch (Kind) {
          ^
/home/rupprecht/src/llvm-project/llvm/lib/Object/ArchiveWriter.cpp:197:11: warning: enumeration value 'K_XCOFF' not handled in switch [-Wswitch]
  switch (Kind) {
          ^

(Didn't get around to reviewing the main source changes in depth yet...)

llvm/test/Object/archive-big-read.test
1 ↗	(On Diff #339243)	Instead of cd'ing to the source tree, these commands should be executed in the build tree -- i.e. drop this line, and change below to `llvm-ar tv %p/Inputs/Big.a`
3 ↗	(On Diff #339243)	nit: it isn't technically required, but this should start with a comment (i.e. `# Test reading ...`) to separate it more clearly from run/check commands

Add bianry file

I've not got the time to review this properly currently.

Of the brief glance I gave this so far:

There doesn't seem to be as much testing as I'd expect given the amount of new code.
Using a prebuilt binary should generally be avoided. You should create the archive at test run time, by using llvm-ar or some other tool (e.g. expanding yaml2obj to support this file format).
I'm not a fan of the complication of the existing archive header code by converting it into a virtual hierarchy. That being said, I haven't looked at the rest of the code properly to give any thought as to whether there's a better approach.

Finally, what's the use-case? Why do you need this functionality?

EGuesnet updated this revision to Diff 339933.Apr 23 2021, 1:36 AM

This comment was removed by EGuesnet.

EGuesnet updated this revision to Diff 339936.Apr 23 2021, 1:41 AM

EGuesnet marked 2 inline comments as done.

This comment was removed by EGuesnet.

[AIX] Support of Big archive (read)

Harbormaster completed remote builds in B100500: Diff 339933.Apr 23 2021, 1:54 AM

Harbormaster completed remote builds in B100503: Diff 339936.

First, sorry for the noise, I do not master arc yet.

I have updated the test file according comments.

Looks like the Windows pre-merge check is failing: https://reviews.llvm.org/harbormaster/unit/view/554528/. I can reproduce locally with ninja check-llvm-object. It looks like the archive is empty? Maybe try regenerating it and uploading the patch again.

I have updated it correctly. Thanks.

While building this, there's a warning about an unhandled switch:

It occurs on ArchiveWriter.cpp. I implement a way to write Big Archive. If you think it is better, I can add some code to remove warning, but my goal is to implement read and write for Big Archive, so it will be corrected in a future commit / PR.

There doesn't seem to be as much testing as I'd expect given the amount of new code.

Read archive has list (t), extract (x) and print (p) operation. x and p are the same code behind. I test t and p. What do you want to add?

Using a prebuilt binary should generally be avoided. You should create the archive at test run time, by using llvm-ar or some other tool (e.g. expanding yaml2obj to support this file format).

I agree it should be avoided. Unfortunately, we cannot create archive at run time, as the write operation is not implemented yet (I work on it). Moreover, yaml2obj has not been ported on AIX (XCOFF object format). In future, when write operation will be implement for Big Archive, binary should be delete.

I'm not a fan of the complication of the existing archive header code by converting it into a virtual hierarchy. That being said, I haven't looked at the rest of the code properly to give any thought as to whether there's a better approach.

We must deal with two struct with different size (ArMemHd). I have not found another clean way to deal with. Moreover, Big Archive are quite a lot different than other archive. Terminator is cosmetic, Fix Size Header is different than Object Header, way to deal with object name is totally different and so on...

Finally, what's the use-case? Why do you need this functionality?

I want to port Rust on AIX. Rust needs archive and it uses llvm library to deal with it. It is not possible (or really hard) to use an external tool, like system ar. So, I must implement read and write support for Big Archive (AIX archive). This PR is the first part: read Big Archive.
As far as I know, it will also be useful to port Clang on AIX, as it will permit to port the CreateExportLists tool. This tool is used to export symbol, a required step to produce correctly binaries on AIX.

Harbormaster completed remote builds in B100499: Diff 339931.Apr 23 2021, 3:34 AM

Harbormaster completed remote builds in B100507: Diff 339941.Apr 23 2021, 3:42 AM

Updating D100651: [AIX] Support of Big archive (read)

Previous version made the assumption libLLVM was loaded once per archive read. It was OK with llvm-ar and all check I have performed, but it is wrong with rust: rust loads libLLVM, and then loads all libraries.

I have corrected this by removing a static variable and by changing way llvm checks if it is the Fixed Size Header or not.

Harbormaster completed remote builds in B104481: Diff 345415.May 14 2021, 6:43 AM

[AIX] Support of Big archive (read)

One line change to decrease usage of CurrentLocation static variable.

Harbormaster completed remote builds in B106281: Diff 347942.May 26 2021, 6:33 AM

rupprecht mentioned this in rGe41aaea26238: [NFC][libObject] clang-format Archive{.h,.cpp}.May 27 2021, 4:51 PM

While building this, there's a warning about an unhandled switch:

It occurs on ArchiveWriter.cpp. I implement a way to write Big Archive. If you think it is better, I can add some code to remove warning, but my goal is to implement read and write for Big Archive, so it will be corrected in a future commit / PR.

Many people build with -Werror, so it would be good to handle the case even if it's explicitly ignored, like:

case K_XCOFF:
  llvm_unreachable("Not handled yet");

Using a prebuilt binary should generally be avoided. You should create the archive at test run time, by using llvm-ar or some other tool (e.g. expanding yaml2obj to support this file format).

I agree it should be avoided. Unfortunately, we cannot create archive at run time, as the write operation is not implemented yet (I work on it). Moreover, yaml2obj has not been ported on AIX (XCOFF object format). In future, when write operation will be implement for Big Archive, binary should be delete.

Generally I agree that prebuilt binaries should be avoided for testing (mostly due to being opaque), but there's also the case that we should have at least one prebuilt binary that we use to avoid symmetric bugs (imagine a bug in a bad archive being created being missed because the archive reader was also changed to read it incorrectly).

llvm/include/llvm/Object/Archive.h
40	The clone() method should return a std::unique_ptr<AbstractArchiveMemberHeader> to avoid raw memory management
198	There's a lot of misc formatting changes in this patch. I formatted everything in e41aaea26238d0a5cb19163863819786e24f0e02, so if you rebase past that, the diff will be cleaner.
350	Does this need to be static? e.g. this could break if any tool needed to read two separate big archives

Updating D100651: [AIX] Support of Big archive (read)

I have modified ArchiveWriter.cpp to avoid the Warning.

I used now an unique_ptr. Code was updated (and simplified) as a consequence. Changes are mainly in Child declaration in Archive.h and Child constructors in Archive.cpp.

EGuesnet added inline comments.May 28 2021, 7:09 AM

llvm/include/llvm/Object/Archive.h
350	As far as I know, static variable is needed. Location must be modified by Child and must be stored in Parent. As Child cannot modify Parent, it is the easiest way to deal with. I agree it could break other tools that read two separate big archive, but I have tested this case: Rust reads lot of separate archives, and probably read them twice in some cases. And it works. The first version was broken due to static variables, but it is now correctly implemented.

Harbormaster completed remote builds in B106704: Diff 348513.May 28 2021, 7:11 AM

EGuesnet mentioned this in D104367: [AIX] Support of Big archive (write).Jun 16 2021, 3:38 AM

Any news, @rupprecht?

Hi, @rupprecht, @jhenderson,
Any news?
It will be better if this PR (and D104367) can be integrated in next LLVM version, and the merge windows will be closed soon.
Thanks.

In D100651#2861441, @EGuesnet wrote:

Hi, @rupprecht, @jhenderson,
Any news?
It will be better if this PR (and D104367) can be integrated in next LLVM version, and the merge windows will be closed soon.
Thanks.

I was leaving @rupprecht to continue reviewing this, but will try to find time to take a look further in the coming days if he doesn't.

Okay, I've spent a bit of time doing some more reviewing here.

Some high-level comments:

You should get some people who are more familiar with XCOFF/AIX to help review correctness. There has been a lot of development for this in other tools (e.g. llvm-readobj).
Update the review summary to be the body of the proposed commit message.
The "Big Archive" format seems to be an AIX-specific thing. I wonder if we should refer to it as the "AIX Big Archive" format everywhere, to avoid confusion. See also inline comments.
I think you should be using the archive member headers directly more often.

I haven't had time to fully go through this review, but this should get you started.

llvm/include/llvm/Object/Archive.h
301–309	Archives can generally hold files of any format, including non-object files like text files. It seems to me the corect name should be `K_AIXBIG`. What do you think?
llvm/lib/Object/Archive.cpp
720–723	This comment implies an empty archive is the same across all formats, but it looks like that's not the case for the Big Archive format. As I understand from my reading, we can't iterate over children to see if the archive is truly empty using the GNU format, for an AIX archive, so the comment needs updating.
758	I'd move this logic below the following comment, since although the process is different, it is still determining the archive format. You can simply add a note saying something like: // AIX Big archive format // Identified purely by magic bytes and uses a unique format.
760–761	Don't reflow your comments before the 80-character column width. Let clang-format do that for you. I'm not really sure what this comment is trying to say, if I'm honest.
974	Do you actually need to change this method in the first instance? You don't have any testing for symbol printing (and I don't think you should at this point...).
979–982
llvm/lib/Object/ArchiveWriter.cpp
141 ↗	(On Diff #348513)	Make sure to clang-format all your modified code. I'd actually use `report_fatal_error()` rather than `llvm_unreachable()`, if the code could be reached by a tool receiving a file of this wrong kind, with the corresponding options. If it is genuinely unreachable of course, there is no need to change. Same below.
llvm/test/Object/archive-big-read.test
1 ↗	(On Diff #348513)	Two related nits: Although strictly not necessary, I find it slightly disorientating when seeing a lit test without comment markers for RUN and CHECK lines. I'd add them. For true comments, I encourage in newer tests to use `##` to disitnguish from CHECK/RUN lines. Also, comments should end with a full stop ('.') - applies below too.
2 ↗	(On Diff #348513)	I'd rename the binary to `bigFormat.a` to avoid confusing it with a large regular archive. Also, a personal preference is to use double dash for long options (i.e. `--strict-whitespace`). Also, I'd add `--implicit-check-not={{.}}` to the FileCheck line. Without it the following output would be successfully matched against, because FileCheck matches sub-strings: rw-r--r-- 0/0 8 Apr 21 14:12 2021 evenlendsabdjasbababfjbasjk It also ensures there's no output before the first line or after the last checked line.
7 ↗	(On Diff #348513)	I think we should test each operation in isolation. In other words, test `t` in a separate file to `p`. One is about reading the archive member list, the other about the archive member contents, so although the latter relies on the former, the inverse isn't true.
8 ↗	(On Diff #348513)	I'm assuming the file contents are "evenlen". I'd actually be inclined to change the contents to be different to the member name. That way, it prevents a bug where the operation is returning the wrong thing. I think you should also add to the FileCheck line `--implicit-check-not={{.}}`. This will ensure that there is no other output printed, as by default FileCheck looks for sub-strings.

jhenderson added inline comments.Jul 9 2021, 2:07 AM

llvm/include/llvm/Object/Archive.h
345–347	Do you really need this length? Why is it `uint32_t`?
350	This is completely unthread safe - if you have a tool parsing two archives simultaneously, this will fail horribly. Do you really need it though? See my comments further down. From reading the spec, the archive uses offsets everywhere and it should be simple to use the values stored in the member headers.
llvm/lib/Object/Archive.cpp
455	Don't add this comment. We can see that these are child constructors by the function signature.
460	Use `std::make_unique` to create a unique pointer without needing to explicitly mention `new`. Same everywhere else you've done this.
469	Don't add a blank line at the function start.
608	It seems to me like the logic you've added to this function is a little odd. I think you may be trying too hard to match the existing logic, when it doesn't really make sense. In traditional UNIX ar format, each child is immediately followed by the next one (barring a possible even-alignment padding byte), stopping when you get to the end. My reading of the Big Archive spec is that the next child's offset is defined by the ar_nxtmem member of its header. You should just be using that and fl_lstmoff to identify if the current child is the last one or not. You can then check to see if the child goes past the buffer end, as per the existing check in this file.

[AIX] Support of Big archive (read)

Add support of AIX Big Archive. Read only.

EGuesnet edited the summary of this revision. (Show Details)Jul 13 2021, 8:25 AM

You should get some people who are more familiar with XCOFF/AIX to help review correctness. There has been a lot of development for this in other tools (e.g. llvm-readobj).

The IBM team that has sent PR about llvm-readobj is aware of this PR. I will recontact them.

I think I have respond to most of your comment. The main issue is the static variable. I need to know the current location to distinguish Fix length header and Object headers, and I have not found cleaner solution.
Note that it has been checked about multiple read: Rust loads libLLVM.so once, and can read multiple archive. It works, as the static variable is initialized correctly before being used. But it is sequential read.

llvm/include/llvm/Object/Archive.h
345–347	Maximal offset (in fixed length header) is stored as 20 bytes as decimal digit. So, maximal length is approximately 10^10. It is the total length of the archive. So, it must be uint64-t, not uint32_t.
350	Main trouble is with first read, to distinguish Fixed length header and other one. We must know the current location, and modify it during read. This cannot be done in Parent (because of const) or Header (not access when it is required). Static variable in archive is the least bad solution I have found.
llvm/lib/Object/Archive.cpp
608	I think you may be trying too hard to match the existing logic Right. My reading of the Big Archive spec is that the next child's offset is defined by the ar_nxtmem member of its header. You should just be using that and fl_lstmoff to identify if the current child is the last one or not. You can then check to see if the child goes past the buffer end, as per the existing check in this file. But using offset needs to know the current location. You say in another comment this his highly not thread-safe and might be avoid. Moreover, if I use fl_lstmoff instead of Length, I must have an access to Fixed length header. It is currently read and forbidden as length of object in archive in the only information we need.
974	I have removed code related to symbol read.
llvm/lib/Object/ArchiveWriter.cpp
141 ↗	(On Diff #348513)	It can be reached by external tools (not llvm-ar), so I modifie it to use `report_fatal_error()`.
llvm/test/Object/archive-big-read.test
1 ↗	(On Diff #348513)	I think it is OK now. I have first done the test inspiring from other one, and they do not use theses convention and flag.
3 ↗	(On Diff #339243)	Done. See `archive-big-extract.test`.

Use of clang-format.

Harbormaster completed remote builds in B113749: Diff 358279.Jul 13 2021, 9:06 AM

I need to know the current location to distinguish Fix length header and Object headers, and I have not found cleaner solution.

I still don't understand why you need to know it to make this distinction. The two header types are completely different things as far as I understand the spec, used in different contexts. Could you read the Fixed Length header as the first part of reading the archive, and the object headers when reading individual children, then store them as members of the respective classes?

Note that it has been checked about multiple read: Rust loads libLLVM.so once, and can read multiple archive. It works, as the static variable is initialized correctly before being used. But it is sequential read.

Just because this works in one single case does not make this an appropriate design - in fact you haven't even tested the case where the issue is, as I bet that the thing you ran read the archives sequentially, not in parallel. What if I wrote a tool that wanted to read multiple Big-format archives in parallel, with one thread per archive? What would happen to the static variable on each individual thread?

Pseudo-code example reading algorithm that doesn't have the issues mentioned:

class Archive {
    FixedLengthHeader Header;
    std::vector<Child> Children;
    ArrayRef<uint8_t> Data;

    Archive(ArrayRef<uint8_t> Data) : Data(Data) {
      Header = reinterpret_cast<FixedLengthHeader> (Data.data());
      uint64_t Offset = Header->FirstChildOffset;
      // Check if archive is empty.
      Child C(Offset, this);
      while (C != nullptr) {
        Children.push_back(C);
        C = C.nextChild();
      }
    }
};

class Child {
    ObjectHeader Header;
    uint8_t *Payload;
    Archive &Parent;
    Child(uint64_t Offset, Archive &Archive) {
      // Check Archive data is large enough for offset + header size + payload size.
      Parent = Archive;
      Header = reinterpret_cast<ObjectHeader>(Parent->Data.data() + Offset);
      Payload = Parent->Data.data() + Offset + sizeof(Header);
    }
    Child *nextChild() {
      if (Header->nextOffset == 0)
        return nullptr;
      return Child(Header->nextOffset, Parent);
    }
};

llvm/lib/Object/Archive.cpp
608	It is currently read and forbidden as length of object in archive in the only information we need. I don't understand this comment at all. Do you mean forgotten, not forbidden? Perhaps you could change how things are read and stored?
llvm/test/Object/archive-big-read.test
1 ↗	(On Diff #348513)	Older tests use older styles. We have developed better styles in the meantime which are preferred for newer tests.

Update D100651.

Hi @jhenderson,
I cannot follow exactly what you propose, because llvm-ar (and maybe other tools) uses directly child_iterator. As this object does not have access to Archive (=Parent), it seems difficult to transfert Child vector of Archive to child_iterator. Moreover, I want to avoid to un-const Parent in Child.

Using the same idea, but not the same exact way, I propose:

Move FixedLengthHeader on Archive, and use it for Big Archive only.
Deal with first read (Fixed Length Header) in Archive creation, so any Child after do not care about it (one of the reason of static variable in previous version).
To distinguish Object and Member Table, I cannot use Next Member in Object Header, because it points to Member Table in the last Object; I cannot use Offset to Member Table of Fixed Length Header, because I do not memorize the current position. So, I use a trick: an Object must have a name (adding empty-file to an archive is non valid), and Member Table Header does not have name, so name field is "`\n" (terminator).
The child_begin_bigarchive() function is equivalent to child_begin(), but it deals correctly with Fixed Length Header.

getSize() and Big Archive Header initialiser are simplified, differences between usual Archive and Big Archive is deal during Child and Archive initialisation (so, correctly separated).

Free List is not yet supported: if you remove object in Archive, it will become unreadable by LLVM. Free List is complex, and not fully documented. I use report_fatal_error() if Free List is present in Archive (so, not 0), or if it is impossible to read it.

Harbormaster completed remote builds in B115865: Diff 361218.Jul 23 2021, 8:50 AM

Esme added a subscriber: Esme.Aug 17 2021, 5:21 AM

Esme added inline comments.

llvm/test/Object/archive-big-extract.test
1 ↗	(On Diff #361218)	Can you please add the command of creating the ar file to the comment? Base on this patch, I am trying to enable tools (e.g. llvm-readobj) for ar format, but I am getting an error from `BigArchiveMemberHeader::getSize()` when applying this patch. # Create the ar file under AIX $ ar -v -q 1.a 1.o $ llvm-ar p 1.a evenlen llvm-ar: error: unable to load '2.a': truncated or malformed archive (characters in size field in archive header are not all decimal numbers: '\000\000\000\000\000\000\000\000\000\0005892' for archive member header at offset 128)

EGuesnet added inline comments.Aug 17 2021, 5:44 AM

llvm/test/Object/archive-big-extract.test
1 ↗	(On Diff #361218)	Hi @Esme , I just procede as: $ # On AIX $ echo evenlen > evenlen $ ar -v -q 1.a evenlen $ # On any server $ llvm-ar p 1.a evenlen evenlen As your error message is related to 2.a, I think there is something wrong to reproduce your error.

Esme added inline comments.Sep 7 2021, 11:18 PM

llvm/lib/Object/Archive.cpp
969	It's not correct to calculate the offset of the first archive member by `Data.getBufferStart() + strlen(Magic) + sizeof(Archive::ArFixLenHdrType);`, please use the value of `ArFixLenHdr->FirstArOffset`. You can easily reproduce the bug if you test a archive file which contains a object file member like the comment I added before: $ xlc 1.c -o 1.o $ ar -v -q 1.a 1.o $ llvm-ar tv 1.a llvm-ar: error: unable to load '1.a': truncated or malformed archive (characters in size field in archive header are not all decimal numbers: '\000\000\000\000\000\000\000\000\000\0005892' for archive member header at offset 128) The correct offset for this case should be 138 instead of 128.

EGuesnet added inline comments.Sep 14 2021, 12:51 AM

llvm/lib/Object/Archive.cpp

969

First, LLVM is no more a priority of our team, so I cannot spend time to this PR.
Second, I am not able to reproduce. Please give me content of 1.c, and content of 1.a.

// small.c
int add_two (int a) {
        return a + 2;
}

$ xlc -v
C for AIX Compiler, Version 5

OK, really old, but we don't care for archive.

$ xlc small.c  -o small.o -c
$ ar -v -q small.a small.o

$ xxd small.a

00000000: 3c62 6967 6166 3e0a 3130 3730 2020 2020  <bigaf>.1070
00000010: 2020 2020 2020 2020 2020 2020 3132 3332              1232
00000020: 2020 2020 2020 2020 2020 2020 2020 2020
00000030: 3020 2020 2020 2020 2020 2020 2020 2020  0
00000040: 2020 2020 3132 3820 2020 2020 2020 2020      128
00000050: 2020 2020 2020 2020 3132 3820 2020 2020          128
00000060: 2020 2020 2020 2020 2020 2020 3020 2020              0
00000070: 2020 2020 2020 2020 2020 2020 2020 2020
// End of Fixed-Length Header: size is 128
00000080: 3832 3020 2020 2020 2020 2020 2020 2020  820
00000090: 2020 2020 3130 3730 2020 2020 2020 2020      1070
000000a0: 2020 2020 2020 2020 3020 2020 2020 2020          0
000000b0: 2020 2020 2020 2020 2020 2020 3136 3331              1631
000000c0: 3630 3435 3039 2020 3020 2020 2020 2020  604509  0
000000d0: 2020 2020 3020 2020 2020 2020 2020 2020      0
000000e0: 3634 3420 2020 2020 2020 2020 3720 2020  644         7
[...]

$ llvm-ar --version
LLVM version 13.0.0

$ llvm-ar t small.a
small.o # OK

Third, according documentation https://www.ibm.com/docs/en/aix/7.2?topic=formats-ar-file-format-big , Fixed-Length Header must have a size of 128. 138 is not valide.

Esme added inline comments.Sep 14 2021, 2:43 AM

llvm/lib/Object/Archive.cpp

969

Thanks for your reply!

If you are no longer following up on this patch, I'd be happy to continue your work. How do you think?

In my previous comment, a binary member is added to the archive, while in your case, there is an object member (compiled with -c option) in archive.

You can also reproduce the issue if you add a dynamic lib to an archive.

$ cat 1.c
int main() {
return 1;
}
$ xlc 1.c -qmkshrobj -o libt.so
$ ar -v -q 1.a libt.so
$ xxd 1.a 
00000000: 3c62 6967 6166 3e0a 3134 3532 2020 2020  <bigaf>.1452    
00000010: 2020 2020 2020 2020 2020 2020 3136 3134              1614
00000020: 2020 2020 2020 2020 2020 2020 2020 2020                  
00000030: 3020 2020 2020 2020 2020 2020 2020 2020  0               
00000040: 2020 2020 3133 3420 2020 2020 2020 2020      134         
00000050: 2020 2020 2020 2020 3133 3420 2020 2020          134     
00000060: 2020 2020 2020 2020 2020 2020 3020 2020              0   
00000070: 2020 2020 2020 2020 2020 2020 2020 2020                  
00000080: 0000 0000 0000 3131 3935 2020 2020 2020  ......1195    
// There are some padding between the Fixed-Length Header and the First member.  
00000090: 2020 2020 2020 2020 2020 3134 3532 2020            1452  
000000a0: 2020 2020 2020 2020 2020 2020 2020 3020                0 
000000b0: 2020 2020 2020 2020 2020 2020 2020 2020

It's correct that Fixed-Length Header must have a size of 128. 138 is not valide. As above, there may be some padding between the Fixed-Length Header and the first member. So I think the value of ArFixLenHdr->FirstArOffset is the exact offset to the first member.

EGuesnet added inline comments.Sep 14 2021, 5:43 AM

llvm/lib/Object/Archive.cpp
969	Using ArFixLenHdr->FirstArOffset does not work. truncated or malformed archive (characters in name length field in archive header are not all decimal numbers: '' for archive member header at offset 68) I do not know where this "68" is from. In my opinion, this PR is complex enough. It might be accepted with as few changes as possible. So, without freelist, undocumented features... In a second time, you can create a new PR, to extend support of Big Archive, with new tests.

Esme added inline comments.Sep 14 2021, 7:31 PM

llvm/lib/Object/Archive.cpp
969	OK, thanks! Do you get the FirstArOffset value via `StringRef(ArFixLenHdr->FirstArOffset, sizeof(ArFixLenHdr->FirstArOffset)).rtrim(' ').getAsInteger(10, Size)` ?

EGuesnet added inline comments.Sep 15 2021, 3:28 AM

llvm/lib/Object/Archive.cpp
969	My mistake, you are right. I take FirstArOffset directly... But in order to treat the padded case, you must add test, with binary file. LLVM community want to avoid adding binary to test. So, I think again a separate PR after this one is preferable.

DiggerLin added a subscriber: DiggerLin.Sep 28 2021, 9:16 AM

DiggerLin added inline comments.

llvm/lib/Object/Archive.cpp
106	There are common source code in the BigArchiveMemberHeader::BigArchiveMemberHeader and ArchiveMemberHeader::ArchiveMemberHeader , we can put the common code in the constructor of AbstractArchiveMemberHeader

DiggerLin added inline comments.Sep 28 2021, 12:16 PM

llvm/include/llvm/Object/Archive.h
315	I do not think we need a special child_begin_bigarchive() API for aix big ar here, we can use the same API child_begin() , there is a member Format in the Archive, we use have different implement of child_begin() base on the value of "Format" , if the Format is K_AIXBIG(), we have different implement of child_begin().

gentle ping, we need the patch, can you help to speed it up ? @EGuesnet

jsji added a reviewer: Restricted Project.Sep 29 2021, 7:48 PM

Hi @DiggerLin,

I do not have time to improve this PR before begin of November. We have other priorities right now.
This PR has nearly 6 months, so, it can wait one more.

DiggerLin added inline comments.Sep 30 2021, 11:52 AM

llvm/include/llvm/Object/Archive.h
98–151	both the ArchiveMemberHeader and the BigArchiveMemberHeader has the "Archive const *Parent;" , can the member put in the AbstractArchiveMemberHeader ?
llvm/lib/Object/Archive.cpp
64	if the RawHeaderPtr == nullptr is true , the value of ArMemHdr will be random here.
110	same problem as above.
177	the variable name "end" can not express the mean, maybe be "NameSize" or "NameLen" ?
319–368	there are several place use the almost the same code from line 308 to 320 . can we change these lines into a helper function?
524	change "Name.size() + Name.size() % 2" to ((Name.size()+1) >>1 ) << 1 ?
532	the Name.startswith("#1/")) is not for the Archive::K_AIXBIG. what happen if the AIXBIG archive file which has member name begin with "#/" here ?

DiggerLin added a reviewer: DiggerLin.Sep 30 2021, 11:52 AM

DiggerLin requested changes to this revision.Sep 30 2021, 2:08 PM

DiggerLin added inline comments.

llvm/lib/Object/Archive.cpp
281	it looks the parameter Size not be used here?

This revision now requires changes to proceed.Sep 30 2021, 2:08 PM

DiggerLin removed a reviewer: DiggerLin.Sep 30 2021, 2:15 PM

This revision now requires review to proceed.Sep 30 2021, 2:15 PM

DiggerLin added a reviewer: DiggerLin.Sep 30 2021, 2:16 PM

DiggerLin added inline comments.Oct 1 2021, 7:03 AM

llvm/include/llvm/Object/Archive.h
315	also using a separate interface child_begin_bigarchive , you can not use the API iterator_range<child_iterator> children(Error &Err, bool SkipInternal = true) const { return make_range(child_begin(Err, SkipInternal), child_end());

the function file_magic llvm::identify_magic(StringRef Magic) should be modified to add big ar identification. otherwise object::createBinary will be error for big ar.

DiggerLin added inline comments.Oct 5 2021, 6:49 AM

llvm/include/llvm/Object/Archive.h
315	or you can call child_begin_bigarchive() in child_begin(Error &Err, bool SkipInternal = true) when Format is K_AIXBIG()

DiggerLin added inline comments.Oct 6 2021, 3:05 PM

llvm/lib/Object/Archive.cpp
319–368	and I just wonder why you need the code std::string Buf; raw_string_ostream OS(Buf); OS.write_escaped( StringRef(ArMemHdr->Size, sizeof(ArMemHdr->Size)).rtrim(" ")); OS.flush(); to convert to std::string. for the StringRef , there is a function str() to convert to std::string, why not use str() of StringRef ?

can you address the comment ,update the patch? we will approve it and We will create a following patch for the "Support of Big archive" @EGuesnet

In my opinion, this PR is complex enough. It might be accepted with as few changes as possible. So, without freelist, undocumented features... In a second time, you can create a new PR, to extend support of Big Archive, with new tests.

As @EGuesnet mentioned above, I think he won't have time to address the comments in time.
However, we do depend on this patch for AIX in near future,
so I would suggest that we *accept* this patch *as it is* for now, and commit it as @EGuesnet suggested.

And @DiggerLin will post follow up patches immediately.

Is that OK for reviewers? @jhenderson @rupprecht @Esme ?

In D100651#3058324, @jsji wrote:

In my opinion, this PR is complex enough. It might be accepted with as few changes as possible. So, without freelist, undocumented features... In a second time, you can create a new PR, to extend support of Big Archive, with new tests.

As @EGuesnet mentioned above, I think he won't have time to address the comments in time.
However, we do depend on this patch for AIX in near future,
so I would suggest that we *accept* this patch *as it is* for now, and commit it as @EGuesnet suggested.

And @DiggerLin will post follow up patches immediately.

Is that OK for reviewers? @jhenderson @rupprecht @Esme ?

This patch is not in a suitable state ready for commit. I've highlighted a high number of issues, and I think there are some fundamental issues with the approach, which need addressing before it is acceptable to land this in LLVM. I haven't finished reviewing the whole patch yet either. I'm sorry if you need the patch in the near future. I can only suggest that you try implementing something yourself, perhaps by adopting and improving this patch, or starting an alternative version (inspired by this patch, and with credit given to @EGuesnet for their initial work, in the commit message).

llvm/include/llvm/Object/Archive.h
39–40	This comment is superfluous. `clone` always means this.
48–52	`getOffset` isn't named `getRawOffset`, so doesn't look like it belong in this block. `getRawName` on the other hand looks like it does belong here. Please make sure all comments follow the existing style. At the moment, comments are doxygen style (see e.g. `getName`). LLVM style also uses trailing full stops on comments - please add them where they are missing. That being said, don't bother with comments that provide no useful benefit beyond the method names. For example, you do not need this comment or the "Non-raw getters" comment: what benefit do they provide?
103	Please order `const` according to standard LLVM style, i.e. `const T *`. If you're changing a function anyway, it makes sense to fix the style.
105	Please separate unrelated method declarations, or methods with non-single-line definitions with a blank line.
177	Do we actually need all these new constructors and assignment operators? Are they used?
179	Under what circumstances can C.Header be a nullptr?
182	Nit: add new line before this method.
183	`Parent` is a raw pointer, there's no need to std::move it.
346	Nit: please use `//` comments in this struct, as per the rest of the code. As I understand it, this is a specific BigArchive class. Please rename it to be clear, e.g. `BigArchiveFixLenHdr`.
354	I think the presence of this member shows that the design is not right. `BigArchive` is a kind of `Archive`. The members specific to BigArchive types (as opposed to e.g. traditional BSD or GNU archives) should not be present in a type that should be generic. I think rather than, or possibly as well as, having this member, you really want to have concrete Archive subclasses that implement the relevant logic, and make Archive an abstract class.
llvm/lib/Object/Archive.cpp
46	I don't like this define. It is not guaranteed that all future archive magic is 8 characters long. I don't think it would be necessary if we made the Archive class a class hierarchy as suggested above. I'm also slightly surprised that BigMagic doesn't start with `!`, but I assume that is correct and intentional.
68	Why has this code moved?
131	Terminator characters are cosmetic only, really, in regular archives too, but we still check them.
173–174	This comment has several typos in, and I'm not really sure what it is actually trying to say.
314–316	I'd prefer that this calculation be factored out into at least a variable, as I don't follow what it represents.
320–322	This pattern is a duplicate of most of the earlier expression. Put it in a variable rather than duplicating code.
402	`getRawName` appears not to trim the trailing spaces, but `getRawUID` does. This seems like a logical inconsistency that will likely lead to bugs.
473	These if/else checks clearly indicate that Child should really be `BigArchiveChild` and something else, sharing a common `Child` abstract base class.
620–625	Revert this change.
659–660	This looks like an unrelated formatting change.

jhenderson requested changes to this revision.Oct 12 2021, 7:55 AM

This revision now requires changes to proceed.Oct 12 2021, 7:55 AM

I can only suggest that you try implementing something yourself, perhaps by adopting and improving this patch, or starting an alternative version (inspired by this patch, and with credit given to @EGuesnet for their initial work, in the commit message).

Thanks @jhenderson ! That would work too . @DiggerLin Let us do it -- you can create another patch based on this and give credit to @EGuesnet if he can NOT address the comment in time.

In D100651#3058580, @jsji wrote:

I can only suggest that you try implementing something yourself, perhaps by adopting and improving this patch, or starting an alternative version (inspired by this patch, and with credit given to @EGuesnet for their initial work, in the commit message).

Thanks @jhenderson ! That would work too . @DiggerLin Let us do it -- you can create another patch based on this and give credit to @EGuesnet if he can NOT address the comment in time.

yes, it is no problem for me. thanks .

Hi @jhenderson, @jsji , @DiggerLin,
As I said before, I do not have time to improve this PR before one or two months at least, as our priorities have shifted.
My proposal to @Esme about accepting the patch with minor changes was OK but now there are too much changes requested by reviewers.
So, I accept that you start an improvement / alternative version as long as you credit.

Please tag me when you will open your PR.
For your information, I have opened another PR about writing Big Archive here: https://reviews.llvm.org/D104367 .

In D100651#3060593, @EGuesnet wrote:

Hi @jhenderson, @jsji , @DiggerLin,
As I said before, I do not have time to improve this PR before one or two months at least, as our priorities have shifted.
My proposal to @Esme about accepting the patch with minor changes was OK but now there are too much changes requested by reviewers.
So, I accept that you start an improvement / alternative version as long as you credit.

Please tag me when you will open your PR.
For your information, I have opened another PR about writing Big Archive here: https://reviews.llvm.org/D104367 .

Great! Thank you for your kind and contribution to this! @EGuesnet

@DiggerLin Please credit and tag @EGuesnet in your new patch. Thanks.

DiggerLin mentioned this in D111889: [AIX] Support of Big archive (read).Oct 15 2021, 7:13 AM

zhijian <zhijian@ca.ibm.com> mentioned this in rG3130134d6e48: [AIX] Support of Big archive (read).Jan 17 2022, 7:37 AM

zhijian <zhijian@ca.ibm.com> mentioned this in rG2164c54315bb: [AIX] Support of Big archive (read).Jan 17 2022, 9:01 AM

zhijian <zhijian@ca.ibm.com> mentioned this in rG4fae93298763: [AIX] Support of Big archive (read).Jan 18 2022, 9:13 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Object/

Archive.h

133 lines

lib/

Object/

Archive.cpp

275 lines

Diff 338117

llvm/include/llvm/Object/Archive.h

Show All 28 Lines
#include <string>		#include <string>
#include <vector>		#include <vector>

namespace llvm {		namespace llvm {
namespace object {		namespace object {

class Archive;		class Archive;

class ArchiveMemberHeader {		class AbstractArchiveMemberHeader {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: 'llvm::object::AbstractArchiveMemberHeader' has virtual functions but non-virtual destructor [clang-diagnostic-non-virtual-dtor] not useful Lint: Pre-merge checks: clang-tidy: warning: 'llvm::object::AbstractArchiveMemberHeader' has virtual functions but non…
public:		public:
friend class Archive;		friend class Archive;

ArchiveMemberHeader(Archive const Parent, const char RawHeaderPtr,
uint64_t Size, Error *Err);
// ArchiveMemberHeader() = default;

/// Get the name without looking up long names.		/// Get the name without looking up long names.
		rupprechtUnsubmitted Done Reply Inline Actions The clone() method should return a std::unique_ptr<AbstractArchiveMemberHeader> to avoid raw memory management rupprecht: The clone() method should return a std::unique_ptr<AbstractArchiveMemberHeader> to avoid raw…
		jhendersonUnsubmitted Not Done Reply Inline Actions This comment is superfluous. `clone` always means this. jhenderson: This comment is superfluous. `clone` always means this.
Expected<StringRef> getRawName() const;		virtual Expected<StringRef> getRawName() const = 0;

/// Get the name looking up long names.		/// Get the name looking up long names.
Expected<StringRef> getName(uint64_t Size) const;		virtual Expected<StringRef> getName(uint64_t Size) const = 0;

Expected<uint64_t> getSize() const;		virtual Expected<uint64_t> getSize() const = 0;

		virtual uint64_t getOffset() const = 0;
		virtual StringRef getRawAccessMode() const = 0;
		virtual StringRef getRawLastModified() const = 0;
		virtual StringRef getRawUID() const = 0;
		virtual StringRef getRawGID() const = 0;
		jhendersonUnsubmitted Not Done Reply Inline Actions `getOffset` isn't named `getRawOffset`, so doesn't look like it belong in this block. `getRawName` on the other hand looks like it does belong here. Please make sure all comments follow the existing style. At the moment, comments are doxygen style (see e.g. `getName`). LLVM style also uses trailing full stops on comments - please add them where they are missing. That being said, don't bother with comments that provide no useful benefit beyond the method names. For example, you do not need this comment or the "Non-raw getters" comment: what benefit do they provide? jhenderson: `getOffset` isn't named `getRawOffset`, so doesn't look like it belong in this block.

Expected<sys::fs::perms> getAccessMode() const;		Expected<sys::fs::perms> getAccessMode() const;
Expected<sys::TimePoint<std::chrono::seconds>> getLastModified() const;		Expected<sys::TimePoint<std::chrono::seconds>> getLastModified() const;

StringRef getRawLastModified() const {
return StringRef(ArMemHdr->LastModified,
sizeof(ArMemHdr->LastModified)).rtrim(' ');
}

Expected<unsigned> getUID() const;		Expected<unsigned> getUID() const;
Expected<unsigned> getGID() const;		Expected<unsigned> getGID() const;

// This returns the size of the private struct ArMemHdrType		// This returns the size of the private struct ArMemHdrType
uint64_t getSizeOf() const {		virtual uint64_t getSizeOf() const = 0;
		virtual uint64_t getFixSizeOf() const = 0;
		};

		class ArchiveMemberHeader : public AbstractArchiveMemberHeader {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: 'llvm::object::ArchiveMemberHeader' has virtual functions but non-virtual destructor [clang-diagnostic-non-virtual-dtor] not useful Lint: Pre-merge checks: clang-tidy: warning: 'llvm::object::ArchiveMemberHeader' has virtual functions but non-virtual…
		public:
		ArchiveMemberHeader(Archive const Parent, const char RawHeaderPtr,
		uint64_t Size, Error *Err);

		Expected<StringRef> getRawName() const override;
		Expected<StringRef> getName(uint64_t Size) const override;
		Expected<uint64_t> getSize() const override;

		uint64_t getOffset() const override;
		StringRef getRawAccessMode() const override;
		StringRef getRawLastModified() const override;
		StringRef getRawUID() const override;
		StringRef getRawGID() const override;

		uint64_t getSizeOf() const override {
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - uint64_t getSizeOf() const override { - return sizeof(ArMemHdrType); - } + uint64_t getSizeOf() const override { return sizeof(ArMemHdrType); } Lint: Pre-merge checks: clang-format: please reformat the code ``` - uint64_t getSizeOf() const override { - return…
return sizeof(ArMemHdrType);		return sizeof(ArMemHdrType);
}		}

		uint64_t getFixSizeOf() const override {
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - uint64_t getFixSizeOf() const override { - return 0; - } + uint64_t getFixSizeOf() const override { return 0; } Lint: Pre-merge checks: clang-format: please reformat the code ``` - uint64_t getFixSizeOf() const override {…
		return 0;
		}

private:		private:
struct ArMemHdrType {		struct ArMemHdrType {
char Name[16];		char Name[16];
char LastModified[12];		char LastModified[12];
char UID[6];		char UID[6];
char GID[6];		char GID[6];
char AccessMode[8];		char AccessMode[8];
char Size[10]; ///< Size of data, not including header or padding.		char Size[10]; ///< Size of data, not including header or padding.
char Terminator[2];		char Terminator[2];
};		};
		ArMemHdrType const *ArMemHdr;
Archive const *Parent;		Archive const *Parent;
		};

		class BigArchiveMemberHeader : public AbstractArchiveMemberHeader {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: 'llvm::object::BigArchiveMemberHeader' has virtual functions but non-virtual destructor [clang-diagnostic-non-virtual-dtor] not useful Lint: Pre-merge checks: clang-tidy: warning: 'llvm::object::BigArchiveMemberHeader' has virtual functions but non…
		public:
		BigArchiveMemberHeader(Archive const Parent, const char RawHeaderPtr,
		jhendersonUnsubmitted Not Done Reply Inline Actions Please order `const` according to standard LLVM style, i.e. `const T `. If you're changing a function anyway, it makes sense to fix the style. jhenderson:* Please order `const` according to standard LLVM style, i.e. `const T *`. If you're changing a…
		uint64_t Size, Error *Err);
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - uint64_t Size, Error Err); - + uint64_t Size, Error Err); + Lint: Pre-merge checks: clang-format: please reformat the code ``` - uint64_t Size, Error *Err)…

		jhendersonUnsubmitted Not Done Reply Inline Actions Please separate unrelated method declarations, or methods with non-single-line definitions with a blank line. jhenderson: Please separate unrelated method declarations, or methods with non-single-line definitions with…
		Expected<StringRef> getRawName() const override;
		Expected<StringRef> getName(uint64_t Size) const override;
		Expected<uint64_t> getSize() const override;

		uint64_t getOffset() const override;
		StringRef getRawAccessMode() const override;
		StringRef getRawLastModified() const override;
		StringRef getRawUID() const override;
		StringRef getRawGID() const override;

		// This returns the size of the private struct ArMemHdrType
		uint64_t getSizeOf() const override {
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - uint64_t getSizeOf() const override { - return sizeof(ArMemHdrType); - } + uint64_t getSizeOf() const override { return sizeof(ArMemHdrType); } Lint: Pre-merge checks: clang-format: please reformat the code ``` - uint64_t getSizeOf() const override { - return…
		return sizeof(ArMemHdrType);
		}

		uint64_t getFixSizeOf() const override {
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - uint64_t getFixSizeOf() const override { - return sizeof(ArFixLenHdrType); - } + uint64_t getFixSizeOf() const override { return sizeof(ArFixLenHdrType); } Lint: Pre-merge checks: clang-format: please reformat the code ``` - uint64_t getFixSizeOf() const override {…
		return sizeof(ArFixLenHdrType);
		}

		private:
		// File Member Header
		struct ArMemHdrType {
		char Size[20];
		char NextOffset[20];
		char PrevOffset[20];
		char LastModified[12];
		char UID[12];
		char GID[12];
		char AccessMode[12];
		char NameLen[4];
		union {
		char Name[2];
		char Terminator[2];
		};
		};

		// AIX Fixed-Length Header (without magic)
		struct ArFixLenHdrType {
		char MemOffset[20]; /Offset to member table /
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - char MemOffset[20]; /Offset to member table / - char GlobSymOffset[20]; /Offset to global symbol table / - char GlobSym64Offset[20]; /Offset global symbol table for 64-bit objects / - char FirstArOffset[20]; /Offset to first archive member / - char LastArOffset[20]; /Offset to last archive member / - char FreeOffset[20]; /Offset to first mem on free list / + char MemOffset[20]; /Offset to member table / + char GlobSymOffset[20]; /Offset to global symbol table / + char GlobSym64Offset[20]; /Offset global symbol table for 64-bit objects / + char FirstArOffset[20]; /Offset to first archive member / 2 diff lines are omitted. See full path. Lint: Pre-merge checks: clang-format: please reformat the code ``` - char MemOffset[20]; /*Offset to member…
		char GlobSymOffset[20]; /Offset to global symbol table /
		char GlobSym64Offset[20]; /Offset global symbol table for 64-bit objects /
		char FirstArOffset[20]; /Offset to first archive member /
		char LastArOffset[20]; /Offset to last archive member /
		char FreeOffset[20]; /Offset to first mem on free list /
		};

		DiggerLinUnsubmitted Not Done Reply Inline Actions both the ArchiveMemberHeader and the BigArchiveMemberHeader has the "Archive const Parent;" , can the member put in the AbstractArchiveMemberHeader ? DiggerLin:* both the ArchiveMemberHeader and the BigArchiveMemberHeader has the "Archive const *Parent;"…
ArMemHdrType const *ArMemHdr;		ArMemHdrType const *ArMemHdr;
		ArFixLenHdrType const *ArFixLenHdr;
		Archive const *Parent;
};		};

class Archive : public Binary {		class Archive : public Binary {
virtual void anchor();		virtual void anchor();

public:		public:
class Child {		class Child {
friend Archive;		friend Archive;
friend ArchiveMemberHeader;		friend AbstractArchiveMemberHeader;

const Archive *Parent;		const Archive *Parent;
ArchiveMemberHeader Header;		AbstractArchiveMemberHeader *Header;
/// Includes header but not padding byte.		/// Includes header but not padding byte.
StringRef Data;		StringRef Data;
/// Offset from Data to the start of the file.		/// Offset from Data to the start of the file.
uint16_t StartOfFile;		uint16_t StartOfFile;

Expected<bool> isThinMember() const;		Expected<bool> isThinMember() const;

public:		public:
Child(const Archive Parent, const char Start, Error *Err);		Child(const Archive Parent, const char Start, Error *Err);
Child(const Archive *Parent, StringRef Data, uint16_t StartOfFile);		Child(const Archive *Parent, StringRef Data, uint16_t StartOfFile);

		jhendersonUnsubmitted Not Done Reply Inline Actions Do we actually need all these new constructors and assignment operators? Are they used? jhenderson: Do we actually need all these new constructors and assignment operators? Are they used?
bool operator ==(const Child &other) const {		bool operator ==(const Child &other) const {
assert(!Parent \|\| !other.Parent \|\| Parent == other.Parent);		assert(!Parent \|\| !other.Parent \|\| Parent == other.Parent);
		jhendersonUnsubmitted Not Done Reply Inline Actions Under what circumstances can C.Header be a nullptr? jhenderson: Under what circumstances can C.Header be a nullptr?
return Data.begin() == other.Data.begin();		return Data.begin() == other.Data.begin();
}		}

		jhendersonUnsubmitted Not Done Reply Inline Actions Nit: add new line before this method. jhenderson: Nit: add new line before this method.
const Archive *getParent() const { return Parent; }		const Archive *getParent() const { return Parent; }
		jhendersonUnsubmitted Not Done Reply Inline Actions `Parent` is a raw pointer, there's no need to std::move it. jhenderson: `Parent` is a raw pointer, there's no need to std::move it.
Expected<Child> getNext() const;		Expected<Child> getNext() const;

Expected<StringRef> getName() const;		Expected<StringRef> getName() const;
Expected<std::string> getFullName() const;		Expected<std::string> getFullName() const;
Expected<StringRef> getRawName() const { return Header.getRawName(); }		Expected<StringRef> getRawName() const { return Header->getRawName(); }

Expected<sys::TimePoint<std::chrono::seconds>> getLastModified() const {		Expected<sys::TimePoint<std::chrono::seconds>> getLastModified() const {
return Header.getLastModified();		return Header->getLastModified();
}		}

StringRef getRawLastModified() const {		StringRef getRawLastModified() const {
return Header.getRawLastModified();		return Header->getRawLastModified();
}		}

Expected<unsigned> getUID() const { return Header.getUID(); }		Expected<unsigned> getUID() const { return Header->getUID(); }
		rupprechtUnsubmitted Done Reply Inline Actions There's a lot of misc formatting changes in this patch. I formatted everything in e41aaea26238d0a5cb19163863819786e24f0e02, so if you rebase past that, the diff will be cleaner. rupprecht: There's a lot of misc formatting changes in this patch. I formatted everything in…
Expected<unsigned> getGID() const { return Header.getGID(); }		Expected<unsigned> getGID() const { return Header->getGID(); }

Expected<sys::fs::perms> getAccessMode() const {		Expected<sys::fs::perms> getAccessMode() const {
return Header.getAccessMode();		return Header->getAccessMode();
}		}

/// \return the size of the archive member without the header or padding.		/// \return the size of the archive member without the header or padding.
Expected<uint64_t> getSize() const;		Expected<uint64_t> getSize() const;
/// \return the size in the archive header for this member.		/// \return the size in the archive header for this member.
Expected<uint64_t> getRawSize() const;		Expected<uint64_t> getRawSize() const;

Expected<StringRef> getBuffer() const;		Expected<StringRef> getBuffer() const;
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	public:
};		};

Archive(MemoryBufferRef Source, Error &Err);		Archive(MemoryBufferRef Source, Error &Err);
static Expected<std::unique_ptr<Archive>> create(MemoryBufferRef Source);		static Expected<std::unique_ptr<Archive>> create(MemoryBufferRef Source);

/// Size field is 10 decimal digits long		/// Size field is 10 decimal digits long
static const uint64_t MaxMemberSize = 9999999999;		static const uint64_t MaxMemberSize = 9999999999;

enum Kind {		enum Kind {
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - enum Kind { - K_GNU, - K_GNU64, - K_BSD, - K_DARWIN, - K_DARWIN64, - K_COFF, - K_XCOFF - }; + enum Kind { K_GNU, K_GNU64, K_BSD, K_DARWIN, K_DARWIN64, K_COFF, K_XCOFF }; Lint: Pre-merge checks: clang-format: please reformat the code ``` - enum Kind { - K_GNU, - K_GNU64, - K_BSD…
K_GNU,		K_GNU,
K_GNU64,		K_GNU64,
K_BSD,		K_BSD,
K_DARWIN,		K_DARWIN,
K_DARWIN64,		K_DARWIN64,
K_COFF		K_COFF,
		K_XCOFF
};		};
		jhendersonUnsubmitted Done Reply Inline Actions Archives can generally hold files of any format, including non-object files like text files. It seems to me the corect name should be `K_AIXBIG`. What do you think? jhenderson: Archives can generally hold files of any format, including non-object files like text files. It…

Kind kind() const { return (Kind)Format; }		Kind kind() const { return (Kind)Format; }
bool isThin() const { return IsThin; }		bool isThin() const { return IsThin; }

child_iterator child_begin(Error &Err, bool SkipInternal = true) const;		child_iterator child_begin(Error &Err, bool SkipInternal = true) const;
child_iterator child_end() const;		child_iterator child_end() const;
		DiggerLinUnsubmitted Not Done Reply Inline Actions I do not think we need a special child_begin_bigarchive() API for aix big ar here, we can use the same API child_begin() , there is a member Format in the Archive, we use have different implement of child_begin() base on the value of "Format" , if the Format is K_AIXBIG(), we have different implement of child_begin(). DiggerLin: I do not think we need a special child_begin_bigarchive() API for aix big ar here, we can use…
		DiggerLinUnsubmitted Not Done Reply Inline Actions also using a separate interface child_begin_bigarchive , you can not use the API iterator_range<child_iterator> children(Error &Err, bool SkipInternal = true) const { return make_range(child_begin(Err, SkipInternal), child_end()); DiggerLin: also using a separate interface child_begin_bigarchive , you can not use the API…
		DiggerLinUnsubmitted Not Done Reply Inline Actions or you can call child_begin_bigarchive() in child_begin(Error &Err, bool SkipInternal = true) when Format is K_AIXBIG() DiggerLin: or you can call child_begin_bigarchive() in child_begin(Error &Err, bool SkipInternal = true)…
iterator_range<child_iterator> children(Error &Err,		iterator_range<child_iterator> children(Error &Err,
bool SkipInternal = true) const {		bool SkipInternal = true) const {
return make_range(child_begin(Err, SkipInternal), child_end());		return make_range(child_begin(Err, SkipInternal), child_end());
}		}

symbol_iterator symbol_begin() const;		symbol_iterator symbol_begin() const;
symbol_iterator symbol_end() const;		symbol_iterator symbol_end() const;
iterator_range<symbol_iterator> symbols() const {		iterator_range<symbol_iterator> symbols() const {
Show All 13 Lines	public:
StringRef getSymbolTable() const { return SymbolTable; }		StringRef getSymbolTable() const { return SymbolTable; }
StringRef getStringTable() const { return StringTable; }		StringRef getStringTable() const { return StringTable; }
uint32_t getNumberOfSymbols() const;		uint32_t getNumberOfSymbols() const;

std::vector<std::unique_ptr<MemoryBuffer>> takeThinBuffers() {		std::vector<std::unique_ptr<MemoryBuffer>> takeThinBuffers() {
return std::move(ThinBuffers);		return std::move(ThinBuffers);
}		}

		// Total length is needed, because end of file is member table and
		// global symbol table.
		jhendersonUnsubmitted Not Done Reply Inline Actions Nit: please use `//` comments in this struct, as per the rest of the code. As I understand it, this is a specific BigArchive class. Please rename it to be clear, e.g. `BigArchiveFixLenHdr`. jhenderson: Nit: please use `//` comments in this struct, as per the rest of the code. As I understand it…
		uint32_t Length = 0;
		jhendersonUnsubmitted Done Reply Inline Actions Do you really need this length? Why is it `uint32_t`? jhenderson: Do you really need this length? Why is it `uint32_t`?
		EGuesnetAuthorUnsubmitted Done Reply Inline Actions Maximal offset (in fixed length header) is stored as 20 bytes as decimal digit. So, maximal length is approximately 10^10. It is the total length of the archive. So, it must be uint64-t, not uint32_t. EGuesnet: Maximal offset (in fixed length header) is stored as 20 bytes as decimal digit. So, maximal…
		// All offset are global offset. So, we need to memorize position.
		static uint32_t CurrentLocation;
		// Fixed length header is treated differently
		rupprechtUnsubmitted Not Done Reply Inline Actions Does this need to be static? e.g. this could break if any tool needed to read two separate big archives rupprecht: Does this need to be static? e.g. this could break if any tool needed to read two separate big…
		EGuesnetAuthorUnsubmitted Done Reply Inline Actions As far as I know, static variable is needed. Location must be modified by Child and must be stored in Parent. As Child cannot modify Parent, it is the easiest way to deal with. I agree it could break other tools that read two separate big archive, but I have tested this case: Rust reads lot of separate archives, and probably read them twice in some cases. And it works. The first version was broken due to static variables, but it is now correctly implemented. EGuesnet: As far as I know, static variable is needed. Location must be modified by Child and must be…
		jhendersonUnsubmitted Not Done Reply Inline Actions This is completely unthread safe - if you have a tool parsing two archives simultaneously, this will fail horribly. Do you really need it though? See my comments further down. From reading the spec, the archive uses offsets everywhere and it should be simple to use the values stored in the member headers. jhenderson: This is completely unthread safe - if you have a tool parsing two archives simultaneously, this…
		EGuesnetAuthorUnsubmitted Done Reply Inline Actions Main trouble is with first read, to distinguish Fixed length header and other one. We must know the current location, and modify it during read. This cannot be done in Parent (because of const) or Header (not access when it is required). Static variable in archive is the least bad solution I have found. EGuesnet: Main trouble is with first read, to distinguish Fixed length header and other one. We must know…
		static uint32_t fixLengthHeader;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'fixLengthHeader' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'fixLengthHeader' [readability-identifier…

private:		private:
StringRef SymbolTable;		StringRef SymbolTable;
		jhendersonUnsubmitted Not Done Reply Inline Actions I think the presence of this member shows that the design is not right. `BigArchive` is a kind of `Archive`. The members specific to BigArchive types (as opposed to e.g. traditional BSD or GNU archives) should not be present in a type that should be generic. I think rather than, or possibly as well as, having this member, you really want to have concrete Archive subclasses that implement the relevant logic, and make Archive an abstract class. jhenderson: I think the presence of this member shows that the design is not right. `BigArchive` is a kind…
StringRef StringTable;		StringRef StringTable;

StringRef FirstRegularData;		StringRef FirstRegularData;
uint16_t FirstRegularStartOfFile = -1;		uint16_t FirstRegularStartOfFile = -1;
void setFirstRegular(const Child &C);		void setFirstRegular(const Child &C);

unsigned Format : 3;		unsigned Format : 3;
unsigned IsThin : 1;		unsigned IsThin : 1;
mutable std::vector<std::unique_ptr<MemoryBuffer>> ThinBuffers;		mutable std::vector<std::unique_ptr<MemoryBuffer>> ThinBuffers;
};		};

} // end namespace object		} // end namespace object
} // end namespace llvm		} // end namespace llvm

#endif // LLVM_OBJECT_ARCHIVE_H		#endif // LLVM_OBJECT_ARCHIVE_H

llvm/lib/Object/Archive.cpp

Show All 34 Lines

#include <system_error> #include <system_error>

using namespace llvm; using namespace llvm;

using namespace object; using namespace object;

using namespace llvm::support::endian; using namespace llvm::support::endian;

const char Magic[] = "!<arch>\n"; const char Magic[] = "!<arch>\n";

const char ThinMagic[] = "!<thin>\n"; const char ThinMagic[] = "!<thin>\n";

const char BigMagic[] = "<bigaf>\n";

// All magic are 8 caractere long

#define MAGIC_LEN 8

jhendersonUnsubmitted

Not Done

I don't like this define. It is not guaranteed that all future archive magic is 8 characters long. I don't think it would be necessary if we made the Archive class a class hierarchy as suggested above.

I'm also slightly surprised that BigMagic doesn't start with !, but I assume that is correct and intentional.

jhenderson: I don't like this define. It is not guaranteed that all future archive magic is 8 characters…

uint32_t Archive::CurrentLocation = 0;

uint32_t Archive::fixLengthHeader = 2;

Lint: Pre-merge checks

clang-tidy: warning: invalid case style for variable 'fixLengthHeader' [readability-identifier-naming]
not useful

Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'fixLengthHeader' [readability-identifier…

void Archive::anchor() {} void Archive::anchor() {}

static Error static Error

malformedError(Twine Msg) { malformedError(Twine Msg) {

std::string StringMsg = "truncated or malformed archive (" + Msg.str() + ")"; std::string StringMsg = "truncated or malformed archive (" + Msg.str() + ")";

return make_error<GenericBinaryError>(std::move(StringMsg), return make_error<GenericBinaryError>(std::move(StringMsg),

object_error::parse_failed); object_error::parse_failed);

} }

ArchiveMemberHeader::ArchiveMemberHeader(const Archive *Parent, ArchiveMemberHeader::ArchiveMemberHeader(const Archive *Parent,

const char *RawHeaderPtr, const char *RawHeaderPtr,

uint64_t Size, Error *Err) uint64_t Size, Error *Err)

: Parent(Parent), : Parent(Parent) {

ArMemHdr(reinterpret_cast<const ArMemHdrType *>(RawHeaderPtr)) {

if (RawHeaderPtr == nullptr) if (RawHeaderPtr == nullptr)

DiggerLinUnsubmitted

Not Done

if the RawHeaderPtr == nullptr is true , the value of ArMemHdr will be random here.

DiggerLin: if the RawHeaderPtr == nullptr is true , the value of ArMemHdr will be random here.

return; return;

ErrorAsOutParameter ErrAsOutParam(Err); ErrorAsOutParameter ErrAsOutParam(Err);

ArMemHdr = reinterpret_cast<const ArMemHdrType *>(RawHeaderPtr);

jhendersonUnsubmitted

Not Done

Why has this code moved?

jhenderson: Why has this code moved?

if (Size < sizeof(ArMemHdrType)) { if (Size < sizeof(ArMemHdrType)) {

if (Err) { if (Err) {

std::string Msg("remaining size of archive too small for next archive " std::string Msg("remaining size of archive too small for next archive "

"member header "); "member header ");

Expected<StringRef> NameOrErr = getName(Size); Expected<StringRef> NameOrErr = getName(Size);

if (!NameOrErr) { if (!NameOrErr) {

consumeError(NameOrErr.takeError()); consumeError(NameOrErr.takeError());

uint64_t Offset = RawHeaderPtr - Parent->getData().data(); uint64_t Offset = RawHeaderPtr - Parent->getData().data();

Show All 20 Lines if (Err) {

*Err = malformedError(Msg + "at offset " + Twine(Offset)); *Err = malformedError(Msg + "at offset " + Twine(Offset));

} else } else

*Err = malformedError(Msg + "for " + NameOrErr.get()); *Err = malformedError(Msg + "for " + NameOrErr.get());

} }

return; return;

} }

BigArchiveMemberHeader::BigArchiveMemberHeader(const Archive *Parent,

DiggerLinUnsubmitted

Not Done

There are common source code in the BigArchiveMemberHeader::BigArchiveMemberHeader and ArchiveMemberHeader::ArchiveMemberHeader , we can put the common code in the constructor of AbstractArchiveMemberHeader

DiggerLin: There are common source code in the BigArchiveMemberHeader::BigArchiveMemberHeader and…

const char *RawHeaderPtr,

Lint: Pre-merge checks

clang-format: please reformat the code

-                                         const char *RawHeaderPtr,
-                                         uint64_t Size, Error *Err)
+                                               const char *RawHeaderPtr,
+                                               uint64_t Size, Error *Err)

Lint: Pre-merge checks: clang-format: please reformat the code ``` - const char…

uint64_t Size, Error *Err)

: Parent(Parent) {

if (RawHeaderPtr == nullptr)

DiggerLinUnsubmitted

Not Done

same problem as above.

DiggerLin: same problem as above.

return;

ErrorAsOutParameter ErrAsOutParam(Err);

if (Parent->fixLengthHeader) {

// AIX big archive Fixed-Length Header

ArFixLenHdr = reinterpret_cast<const ArFixLenHdrType *>(RawHeaderPtr);

--Archive::fixLengthHeader;

// We want File member archive only for ArMemHdr.

const char *RawMemberHeaderPtr = RawHeaderPtr + sizeof(ArFixLenHdrType);

ArMemHdr = reinterpret_cast<const ArMemHdrType *>(RawMemberHeaderPtr);

} else {

// AIX without Fixed Size Header.

ArMemHdr = reinterpret_cast<const ArMemHdrType *>(RawHeaderPtr);

}

if (Size < sizeof(ArMemHdrType)) {

if (Err) {

std::string Msg("remaining size of archive too small for next archive "

"member header ");

Expected<StringRef> NameOrErr = getName(Size);

if (!NameOrErr) {

jhendersonUnsubmitted

Not Done

Terminator characters are cosmetic only, really, in regular archives too, but we still check them.

jhenderson: Terminator characters are cosmetic only, really, in regular archives too, but we still check…

consumeError(NameOrErr.takeError());

uint64_t Offset = RawHeaderPtr - Parent->getData().data();

*Err = malformedError(Msg + "at offset " + Twine(Offset));

} else

*Err = malformedError(Msg + "for " + NameOrErr.get());

}

return;

}

// Terminator is cosmetic only for big archive

}

// This gets the raw name from the ArMemHdr->Name field and checks that it is // This gets the raw name from the ArMemHdr->Name field and checks that it is

// valid for the kind of archive. If it is not valid it returns an Error. // valid for the kind of archive. If it is not valid it returns an Error.

Expected<StringRef> ArchiveMemberHeader::getRawName() const { Expected<StringRef> ArchiveMemberHeader::getRawName() const {

char EndCond; char EndCond;

auto Kind = Parent->kind(); auto Kind = Parent->kind();

if (Kind == Archive::K_BSD || Kind == Archive::K_DARWIN64) { if (Kind == Archive::K_BSD || Kind == Archive::K_DARWIN64) {

if (ArMemHdr->Name[0] == ' ') { if (ArMemHdr->Name[0] == ' ') {

uint64_t Offset = reinterpret_cast<const char *>(ArMemHdr) - uint64_t Offset = reinterpret_cast<const char *>(ArMemHdr) -

Show All 12 Lines Expected<StringRef> ArchiveMemberHeader::getRawName() const {

if (end == StringRef::npos) if (end == StringRef::npos)

end = sizeof(ArMemHdr->Name); end = sizeof(ArMemHdr->Name);

assert(end <= sizeof(ArMemHdr->Name) && end > 0); assert(end <= sizeof(ArMemHdr->Name) && end > 0);

// Don't include the EndCond if there is one. // Don't include the EndCond if there is one.

return StringRef(ArMemHdr->Name, end); return StringRef(ArMemHdr->Name, end);

} }

// This gets the name looking up long names. Size is the size of the archive // This gets the name looking up long names. Size is the size of the archive

Expected<StringRef> BigArchiveMemberHeader::getRawName() const {

// Name is outside ArMemHdr, and there is no end caracter

// name lenght is in NameLen field

jhendersonUnsubmitted

Not Done

This comment has several typos in, and I'm not really sure what it is actually trying to say.

jhenderson: This comment has several typos in, and I'm not really sure what it is actually trying to say.

// The two first char of name are already in ArMemHdrType

// but unused terminator '`\n' is after the name.

StringRef::size_type end = strtol(ArMemHdr->NameLen, NULL, 10);

Lint: Pre-merge checks

clang-tidy: warning: invalid case style for variable 'end' [readability-identifier-naming]
not useful

Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'end' [readability-identifier-naming]…

DiggerLinUnsubmitted

Not Done

the variable name "end" can not express the mean, maybe be "NameSize" or "NameLen" ?

DiggerLin: the variable name "end" can not express the mean, maybe be "NameSize" or "NameLen" ?

return StringRef(ArMemHdr->Name, end);

}

// member including the header, so the size of any name following the header // member including the header, so the size of any name following the header

// is checked to make sure it does not overflow. // is checked to make sure it does not overflow.

Expected<StringRef> ArchiveMemberHeader::getName(uint64_t Size) const { Expected<StringRef> ArchiveMemberHeader::getName(uint64_t Size) const {

// This can be called from the ArchiveMemberHeader constructor when the // This can be called from the ArchiveMemberHeader constructor when the

// archive header is truncated to produce an error message with the name. // archive header is truncated to produce an error message with the name.

// Make sure the name field is not truncated. // Make sure the name field is not truncated.

if (Size < offsetof(ArMemHdrType, Name) + sizeof(ArMemHdr->Name)) { if (Size < offsetof(ArMemHdrType, Name) + sizeof(ArMemHdr->Name)) {

▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines Expected<StringRef> ArchiveMemberHeader::getName(uint64_t Size) const {

// It is not a long name so trim the blanks at the end of the name. // It is not a long name so trim the blanks at the end of the name.

if (Name[Name.size() - 1] != '/') if (Name[Name.size() - 1] != '/')

return Name.rtrim(' '); return Name.rtrim(' ');

// It's a simple name. // It's a simple name.

return Name.drop_back(1); return Name.drop_back(1);

} }

Expected<StringRef> BigArchiveMemberHeader::getName(uint64_t Size) const {

DiggerLinUnsubmitted

Not Done

it looks the parameter Size not be used here?

DiggerLin: it looks the parameter Size not be used here?

// Size check is different with Big Archive TODO

// The raw name itself can be invalid.

Expected<StringRef> NameOrErr = getRawName();

if (!NameOrErr)

return NameOrErr.takeError();

StringRef Name = NameOrErr.get();

// Trim the blanks at the end of the name.

return Name.rtrim(' ');

}

Expected<uint64_t> ArchiveMemberHeader::getSize() const { Expected<uint64_t> ArchiveMemberHeader::getSize() const {

uint64_t Ret; uint64_t Size;

if (StringRef(ArMemHdr->Size, if (StringRef(ArMemHdr->Size,

Lint: Pre-merge checks

clang-format: please reformat the code

-  if (StringRef(ArMemHdr->Size,
-                sizeof(ArMemHdr->Size)).rtrim(" ").getAsInteger(10, Size)) {
+  if (StringRef(ArMemHdr->Size, sizeof(ArMemHdr->Size))
+          .rtrim(" ")
+          .getAsInteger(10, Size)) {

Lint: Pre-merge checks: clang-format: please reformat the code ``` - if (StringRef(ArMemHdr->Size…

sizeof(ArMemHdr->Size)).rtrim(" ").getAsInteger(10, Ret)) { sizeof(ArMemHdr->Size)).rtrim(" ").getAsInteger(10, Size)) {

std::string Buf; std::string Buf;

raw_string_ostream OS(Buf); raw_string_ostream OS(Buf);

OS.write_escaped(StringRef(ArMemHdr->Size, OS.write_escaped(StringRef(ArMemHdr->Size,

sizeof(ArMemHdr->Size)).rtrim(" ")); sizeof(ArMemHdr->Size)).rtrim(" "));

OS.flush(); OS.flush();

uint64_t Offset = reinterpret_cast<const char *>(ArMemHdr) - uint64_t Offset = reinterpret_cast<const char *>(ArMemHdr) -

Parent->getData().data(); Parent->getData().data();

return malformedError("characters in size field in archive header are not " return malformedError("characters in size field in archive header are not "

"all decimal numbers: '" + Buf + "' for archive " "all decimal numbers: '" + Buf + "' for archive "

"member header at offset " + Twine(Offset)); "member header at offset " + Twine(Offset));

} }

return Ret; return Size;

} }

Expected<sys::fs::perms> ArchiveMemberHeader::getAccessMode() const { Expected<uint64_t> BigArchiveMemberHeader::getSize() const {

unsigned Ret; uint64_t Size;

if (StringRef(ArMemHdr->AccessMode, uint64_t NameLen;

sizeof(ArMemHdr->AccessMode)).rtrim(' ').getAsInteger(8, Ret)) { if (StringRef(ArMemHdr->Size,

Lint: Pre-merge checks

clang-format: please reformat the code

-  if (StringRef(ArMemHdr->Size,
-                sizeof(ArMemHdr->Size)).rtrim(" ").getAsInteger(10, Size) ||
-      (Parent->kind() == Archive::K_XCOFF && StringRef(ArMemHdr->NameLen,
-                sizeof(ArMemHdr->NameLen)).rtrim(" ").getAsInteger(10, NameLen))) {
+  if (StringRef(ArMemHdr->Size, sizeof(ArMemHdr->Size))
+          .rtrim(" ")
+          .getAsInteger(10, Size) ||
+      (Parent->kind() == Archive::K_XCOFF &&
+       StringRef(ArMemHdr->NameLen, sizeof(ArMemHdr->NameLen))
+           .rtrim(" ")
+           .getAsInteger(10, NameLen))) {

Lint: Pre-merge checks: clang-format: please reformat the code ``` - if (StringRef(ArMemHdr->Size…

sizeof(ArMemHdr->Size)).rtrim(" ").getAsInteger(10, Size) ||

jhendersonUnsubmitted

Not Done

I'd prefer that this calculation be factored out into at least a variable, as I don't follow what it represents.

jhenderson: I'd prefer that this calculation be factored out into at least a variable, as I don't follow…

(Parent->kind() == Archive::K_XCOFF && StringRef(ArMemHdr->NameLen,

sizeof(ArMemHdr->NameLen)).rtrim(" ").getAsInteger(10, NameLen))) {

std::string Buf; std::string Buf;

raw_string_ostream OS(Buf); raw_string_ostream OS(Buf);

OS.write_escaped(StringRef(ArMemHdr->AccessMode, OS.write_escaped(StringRef(ArMemHdr->Size,

Lint: Pre-merge checks

clang-format: please reformat the code

-    OS.write_escaped(StringRef(ArMemHdr->Size,
-                               sizeof(ArMemHdr->Size)).rtrim(" "));
+    OS.write_escaped(
+        StringRef(ArMemHdr->Size, sizeof(ArMemHdr->Size)).rtrim(" "));

Lint: Pre-merge checks: clang-format: please reformat the code ``` - OS.write_escaped(StringRef(ArMemHdr->Size…

sizeof(ArMemHdr->AccessMode)).rtrim(" ")); sizeof(ArMemHdr->Size)).rtrim(" "));

jhendersonUnsubmitted

Not Done

This pattern is a duplicate of most of the earlier expression. Put it in a variable rather than duplicating code.

jhenderson: This pattern is a duplicate of most of the earlier expression. Put it in a variable rather than…

OS.flush(); OS.flush();

uint64_t Offset = reinterpret_cast<const char *>(ArMemHdr) - uint64_t Offset = reinterpret_cast<const char *>(ArMemHdr) -

Parent->getData().data(); Parent->getData().data();

return malformedError("characters in size field in archive header are not "

"all decimal numbers: '" + Buf + "' for archive "

Lint: Pre-merge checks

clang-format: please reformat the code

-                          "all decimal numbers: '" + Buf + "' for archive "
-                          "member header at offset " + Twine(Offset));
-  }
-  // First read: size is header + object size + name round to be even + Fixed-Length Header + magic
-  // header is added automatically
+                          "all decimal numbers: '" +
+                          Buf +
+                          "' for archive "
+                          "member header at offset " +
+                          Twine(Offset));

3 diff lines are omitted. See full path.

Lint: Pre-merge checks: clang-format: please reformat the code ``` - "all decimal numbers: '"…

"member header at offset " + Twine(Offset));

}

// First read: size is header + object size + name round to be even + Fixed-Length Header + magic

// header is added automatically

if (Parent->CurrentLocation == 0)

return Size + NameLen + NameLen%2 + sizeof(ArFixLenHdrType);

Lint: Pre-merge checks

clang-format: please reformat the code

-    return Size + NameLen + NameLen%2 + sizeof(ArFixLenHdrType);
+    return Size + NameLen + NameLen % 2 + sizeof(ArFixLenHdrType);

Lint: Pre-merge checks: clang-format: please reformat the code ``` - return Size + NameLen + NameLen%2 + sizeof…

// Next read: size is header + object size + name round to be even

else

return Size + NameLen + NameLen%2;

Lint: Pre-merge checks

clang-format: please reformat the code

-    return Size + NameLen + NameLen%2;
+    return Size + NameLen + NameLen % 2;

Lint: Pre-merge checks: clang-format: please reformat the code ``` - return Size + NameLen + NameLen%2; + return…

}

uint64_t ArchiveMemberHeader::getOffset() const {

uint64_t Offset = reinterpret_cast<const char *>(ArMemHdr) -

Lint: Pre-merge checks

clang-format: please reformat the code

-  uint64_t Offset = reinterpret_cast<const char *>(ArMemHdr) -
-                      Parent->getData().data();
+  uint64_t Offset =
+      reinterpret_cast<const char *>(ArMemHdr) - Parent->getData().data();

Lint: Pre-merge checks: clang-format: please reformat the code ``` - uint64_t Offset = reinterpret_cast<const char *>…

Parent->getData().data();

return Offset;

}

uint64_t BigArchiveMemberHeader::getOffset() const {

uint64_t Offset = reinterpret_cast<const char *>(ArMemHdr) -

Lint: Pre-merge checks

clang-format: please reformat the code

-  uint64_t Offset = reinterpret_cast<const char *>(ArMemHdr) -
-                      Parent->getData().data();
+  uint64_t Offset =
+      reinterpret_cast<const char *>(ArMemHdr) - Parent->getData().data();

Lint: Pre-merge checks: clang-format: please reformat the code ``` - uint64_t Offset = reinterpret_cast<const char *>…

Parent->getData().data();

return Offset;

}

// This gets the raw name from the ArMemHdr->AccessMode field.

StringRef ArchiveMemberHeader::getRawAccessMode() const {

return StringRef(ArMemHdr->AccessMode, sizeof(ArMemHdr->AccessMode)).rtrim(' ');

Lint: Pre-merge checks

clang-format: please reformat the code

-  return StringRef(ArMemHdr->AccessMode, sizeof(ArMemHdr->AccessMode)).rtrim(' ');
+  return StringRef(ArMemHdr->AccessMode, sizeof(ArMemHdr->AccessMode))
+      .rtrim(' ');

Lint: Pre-merge checks: clang-format: please reformat the code ``` - return StringRef(ArMemHdr->AccessMode, sizeof…

}

StringRef BigArchiveMemberHeader::getRawAccessMode() const {

return StringRef(ArMemHdr->AccessMode, sizeof(ArMemHdr->AccessMode)).rtrim(' ');

Lint: Pre-merge checks

clang-format: please reformat the code

-  return StringRef(ArMemHdr->AccessMode, sizeof(ArMemHdr->AccessMode)).rtrim(' ');
+  return StringRef(ArMemHdr->AccessMode, sizeof(ArMemHdr->AccessMode))
+      .rtrim(' ');

Lint: Pre-merge checks: clang-format: please reformat the code ``` - return StringRef(ArMemHdr->AccessMode, sizeof…

}

Expected<sys::fs::perms> AbstractArchiveMemberHeader::getAccessMode() const {

unsigned Ret;

if (getRawAccessMode().getAsInteger(8, Ret)) {

std::string Buf;

raw_string_ostream OS(Buf);

OS.write_escaped(getRawAccessMode());

OS.flush();

uint64_t Offset = getOffset();

return malformedError("characters in AccessMode field in archive header " return malformedError("characters in AccessMode field in archive header "

DiggerLinUnsubmitted

Not Done

there are several place use the almost the same code from line 308 to 320 . can we change these lines into a helper function?

DiggerLin: there are several place use the almost the same code from line 308 to 320 . can we change these…

DiggerLinUnsubmitted

Not Done

and I just wonder why you need the code

std::string Buf;
    raw_string_ostream OS(Buf);
    OS.write_escaped(
        StringRef(ArMemHdr->Size, sizeof(ArMemHdr->Size)).rtrim(" "));
    OS.flush();

to convert to std::string.

for the StringRef , there is a function str() to convert to std::string, why not use str() of StringRef ?

DiggerLin: and I just wonder why you need the code ``` std::string Buf; raw_string_ostream OS(Buf)…

"are not all decimal numbers: '" + Buf + "' for the " "are not all decimal numbers: '" + Buf + "' for the "

"archive member header at offset " + Twine(Offset)); "archive member header at offset " + Twine(Offset));

} }

return static_cast<sys::fs::perms>(Ret); return static_cast<sys::fs::perms>(Ret);

} }

// This gets the raw name from the ArMemHdr->LastModified field.

StringRef ArchiveMemberHeader::getRawLastModified() const {

return StringRef(ArMemHdr->LastModified, sizeof(ArMemHdr->LastModified)).rtrim(' ');

Lint: Pre-merge checks

clang-format: please reformat the code

-  return StringRef(ArMemHdr->LastModified, sizeof(ArMemHdr->LastModified)).rtrim(' ');
+  return StringRef(ArMemHdr->LastModified, sizeof(ArMemHdr->LastModified))
+      .rtrim(' ');

Lint: Pre-merge checks: clang-format: please reformat the code ``` - return StringRef(ArMemHdr->LastModified, sizeof…

}

StringRef BigArchiveMemberHeader::getRawLastModified() const {

return StringRef(ArMemHdr->LastModified, sizeof(ArMemHdr->LastModified)).rtrim(' ');

Lint: Pre-merge checks

clang-format: please reformat the code

-  return StringRef(ArMemHdr->LastModified, sizeof(ArMemHdr->LastModified)).rtrim(' ');
+  return StringRef(ArMemHdr->LastModified, sizeof(ArMemHdr->LastModified))
+      .rtrim(' ');

Lint: Pre-merge checks: clang-format: please reformat the code ``` - return StringRef(ArMemHdr->LastModified, sizeof…

}

Expected<sys::TimePoint<std::chrono::seconds>> Expected<sys::TimePoint<std::chrono::seconds>>

ArchiveMemberHeader::getLastModified() const { AbstractArchiveMemberHeader::getLastModified() const {

unsigned Seconds; unsigned Seconds;

if (StringRef(ArMemHdr->LastModified, if (getRawLastModified().getAsInteger(10, Seconds)) {

sizeof(ArMemHdr->LastModified)).rtrim(' ')

.getAsInteger(10, Seconds)) {

std::string Buf; std::string Buf;

raw_string_ostream OS(Buf); raw_string_ostream OS(Buf);

OS.write_escaped(StringRef(ArMemHdr->LastModified, OS.write_escaped(StringRef(getRawLastModified()));

sizeof(ArMemHdr->LastModified)).rtrim(" "));

OS.flush(); OS.flush();

uint64_t Offset = reinterpret_cast<const char *>(ArMemHdr) - uint64_t Offset = getOffset();

Parent->getData().data();

return malformedError("characters in LastModified field in archive header " return malformedError("characters in LastModified field in archive header "

"are not all decimal numbers: '" + Buf + "' for the " "are not all decimal numbers: '" + Buf + "' for the "

"archive member header at offset " + Twine(Offset)); "archive member header at offset " + Twine(Offset));

} }

return sys::toTimePoint(Seconds); return sys::toTimePoint(Seconds);

} }

Expected<unsigned> ArchiveMemberHeader::getUID() const { // This gets the raw name from the ArMemHdr->UID field.

StringRef ArchiveMemberHeader::getRawUID() const {

jhendersonUnsubmitted

Not Done

getRawName appears not to trim the trailing spaces, but getRawUID does. This seems like a logical inconsistency that will likely lead to bugs.

jhenderson: `getRawName` appears not to trim the trailing spaces, but `getRawUID` does. This seems like a…

return StringRef(ArMemHdr->UID, sizeof(ArMemHdr->UID)).rtrim(' ');

}

StringRef BigArchiveMemberHeader::getRawUID() const {

return StringRef(ArMemHdr->UID, sizeof(ArMemHdr->UID)).rtrim(' ');

}

Expected<unsigned> AbstractArchiveMemberHeader::getUID() const {

unsigned Ret; unsigned Ret;

StringRef User = StringRef(ArMemHdr->UID, sizeof(ArMemHdr->UID)).rtrim(' '); StringRef User = getRawUID();

if (User.empty()) if (User.empty())

return 0; return 0;

if (User.getAsInteger(10, Ret)) { if (User.getAsInteger(10, Ret)) {

std::string Buf; std::string Buf;

raw_string_ostream OS(Buf); raw_string_ostream OS(Buf);

OS.write_escaped(User); OS.write_escaped(User);

OS.flush(); OS.flush();

uint64_t Offset = reinterpret_cast<const char *>(ArMemHdr) - uint64_t Offset = getOffset();

Parent->getData().data();

return malformedError("characters in UID field in archive header " return malformedError("characters in UID field in archive header "

"are not all decimal numbers: '" + Buf + "' for the " "are not all decimal numbers: '" + Buf + "' for the "

"archive member header at offset " + Twine(Offset)); "archive member header at offset " + Twine(Offset));

} }

return Ret; return Ret;

} }

Expected<unsigned> ArchiveMemberHeader::getGID() const { // This gets the raw name from the ArMemHdr->GID field.

StringRef ArchiveMemberHeader::getRawGID() const {

return StringRef(ArMemHdr->GID, sizeof(ArMemHdr->GID)).rtrim(' ');

}

StringRef BigArchiveMemberHeader::getRawGID() const {

return StringRef(ArMemHdr->GID, sizeof(ArMemHdr->GID)).rtrim(' ');

}

Expected<unsigned> AbstractArchiveMemberHeader::getGID() const {

unsigned Ret; unsigned Ret;

StringRef Group = StringRef(ArMemHdr->GID, sizeof(ArMemHdr->GID)).rtrim(' '); StringRef Group = getRawGID();

if (Group.empty()) if (Group.empty())

return 0; return 0;

if (Group.getAsInteger(10, Ret)) { if (Group.getAsInteger(10, Ret)) {

std::string Buf; std::string Buf;

raw_string_ostream OS(Buf); raw_string_ostream OS(Buf);

OS.write_escaped(Group); OS.write_escaped(Group);

OS.flush(); OS.flush();

uint64_t Offset = reinterpret_cast<const char *>(ArMemHdr) - uint64_t Offset = getOffset();

Parent->getData().data();

return malformedError("characters in GID field in archive header " return malformedError("characters in GID field in archive header "

"are not all decimal numbers: '" + Buf + "' for the " "are not all decimal numbers: '" + Buf + "' for the "

"archive member header at offset " + Twine(Offset)); "archive member header at offset " + Twine(Offset));

} }

return Ret; return Ret;

} }

Archive::Child::Child(const Archive *Parent, StringRef Data, Archive::Child::Child(const Archive *Parent, StringRef Data,

jhendersonUnsubmitted

Done

Don't add this comment. We can see that these are child constructors by the function signature.

jhenderson: Don't add this comment. We can see that these are child constructors by the function signature.

uint16_t StartOfFile) uint16_t StartOfFile)

: Parent(Parent), Header(Parent, Data.data(), Data.size(), nullptr), : Parent(Parent), Data(Data), StartOfFile(StartOfFile) {

Data(Data), StartOfFile(StartOfFile) { if (Parent->kind() != K_XCOFF) {

Header = new ArchiveMemberHeader(Parent, Data.data(), Data.size(), nullptr);

} else {

jhendersonUnsubmitted

Done

Use std::make_unique to create a unique pointer without needing to explicitly mention new. Same everywhere else you've done this.

jhenderson: Use `std::make_unique` to create a unique pointer without needing to explicitly mention `new`.

Header = new BigArchiveMemberHeader(Parent, Data.data(), Data.size(), nullptr);

Lint: Pre-merge checks

clang-format: please reformat the code

-    Header = new BigArchiveMemberHeader(Parent, Data.data(), Data.size(), nullptr);
+    Header =
+        new BigArchiveMemberHeader(Parent, Data.data(), Data.size(), nullptr);

Lint: Pre-merge checks: clang-format: please reformat the code ``` - Header = new BigArchiveMemberHeader(Parent…

}

} }

Archive::Child::Child(const Archive *Parent, const char *Start, Error *Err) Archive::Child::Child(const Archive *Parent, const char *Start, Error *Err)

: Parent(Parent), : Parent(Parent) {

Header(Parent, Start,

Parent

? Parent->getData().size() - (Start - Parent->getData().data())

: 0, Err) {

if (!Start) if (!Start)

return; return;

jhendersonUnsubmitted

Done

Don't add a blank line at the function start.

jhenderson: Don't add a blank line at the function start.

if (Parent->kind() != K_XCOFF) {

Header = new ArchiveMemberHeader(Parent, Start,

Lint: Pre-merge checks

clang-format: please reformat the code

-    Header = new ArchiveMemberHeader(Parent, Start,
-      Parent
-      ? Parent->getData().size() - (Start - Parent->getData().data()) : 0, Err);
+    Header = new ArchiveMemberHeader(
+        Parent, Start,
+        Parent ? Parent->getData().size() - (Start - Parent->getData().data())
+               : 0,
+        Err);

Lint: Pre-merge checks: clang-format: please reformat the code ``` - Header = new ArchiveMemberHeader(Parent, Start…

Parent

jhendersonUnsubmitted

Not Done

These if/else checks clearly indicate that Child should really be BigArchiveChild and something else, sharing a common Child abstract base class.

jhenderson: These if/else checks clearly indicate that Child should really be `BigArchiveChild` and…

? Parent->getData().size() - (Start - Parent->getData().data()) : 0, Err);

} else {

Header = new BigArchiveMemberHeader(Parent, Start,

Lint: Pre-merge checks

clang-format: please reformat the code

-    Header = new BigArchiveMemberHeader(Parent, Start,
-      Parent
-      ? Parent->getData().size() - (Start - Parent->getData().data()) : 0, Err);
+    Header = new BigArchiveMemberHeader(
+        Parent, Start,
+        Parent ? Parent->getData().size() - (Start - Parent->getData().data())
+               : 0,
+        Err);

Lint: Pre-merge checks: clang-format: please reformat the code ``` - Header = new BigArchiveMemberHeader(Parent…

Parent

? Parent->getData().size() - (Start - Parent->getData().data()) : 0, Err);

}

// If we are pointed to real data, Start is not a nullptr, then there must be // If we are pointed to real data, Start is not a nullptr, then there must be

// a non-null Err pointer available to report malformed data on. Only in // a non-null Err pointer available to report malformed data on. Only in

// the case sentinel value is being constructed is Err is permitted to be a // the case sentinel value is being constructed is Err is permitted to be a

// nullptr. // nullptr.

assert(Err && "Err can't be nullptr if Start is not a nullptr"); assert(Err && "Err can't be nullptr if Start is not a nullptr");

ErrorAsOutParameter ErrAsOutParam(Err); ErrorAsOutParameter ErrAsOutParam(Err);

// If there was an error in the construction of the Header // If there was an error in the construction of the Header

// then just return with the error now set. // then just return with the error now set.

if (*Err) if (*Err)

return; return;

uint64_t Size = Header.getSizeOf(); uint64_t Size = Header->getSizeOf();

Data = StringRef(Start, Size); Data = StringRef(Start, Size);

Expected<bool> isThinOrErr = isThinMember(); Expected<bool> isThinOrErr = isThinMember();

if (!isThinOrErr) { if (!isThinOrErr) {

*Err = isThinOrErr.takeError(); *Err = isThinOrErr.takeError();

return; return;

} }

bool isThin = isThinOrErr.get(); bool isThin = isThinOrErr.get();

if (!isThin) { if (!isThin) {

Expected<uint64_t> MemberSize = getRawSize(); Expected<uint64_t> MemberSize = getRawSize();

if (!MemberSize) { if (!MemberSize) {

*Err = MemberSize.takeError(); *Err = MemberSize.takeError();

return; return;

} }

Size += MemberSize.get(); Size += MemberSize.get();

Data = StringRef(Start, Size); Data = StringRef(Start, Size);

} }

// Setup StartOfFile and PaddingBytes. // Setup StartOfFile and PaddingBytes.

StartOfFile = Header.getSizeOf(); StartOfFile = Header->getSizeOf();

// Don't include attached name. // Don't include attached name.

Expected<StringRef> NameOrErr = getRawName(); Expected<StringRef> NameOrErr = getRawName();

if (!NameOrErr){ if (!NameOrErr){

*Err = NameOrErr.takeError(); *Err = NameOrErr.takeError();

return; return;

} }

StringRef Name = NameOrErr.get(); StringRef Name = NameOrErr.get();

if (Parent->kind() == Archive::K_XCOFF && Parent->fixLengthHeader) {

// Add name to found the real start

// Add also Fixed-Length Header in the first read.

DiggerLinUnsubmitted

Not Done

change "Name.size() + Name.size() % 2" to ((Name.size()+1) >>1 ) << 1 ?

DiggerLin: change "Name.size() + Name.size() % 2" to ((Name.size()+1) >>1 ) << 1 ?

StartOfFile += Name.size() + Name.size()%2;

Lint: Pre-merge checks

clang-format: please reformat the code

-    StartOfFile += Name.size() + Name.size()%2;
+    StartOfFile += Name.size() + Name.size() % 2;

Lint: Pre-merge checks: clang-format: please reformat the code ``` - StartOfFile += Name.size() + Name.size()%2; +…

StartOfFile += Header->getFixSizeOf();

} else if (Parent->kind() == Archive::K_XCOFF) {

// Add name to found the real start

StartOfFile += Name.size() + Name.size()%2;

Lint: Pre-merge checks

clang-format: please reformat the code

-    StartOfFile += Name.size() + Name.size()%2;
+    StartOfFile += Name.size() + Name.size() % 2;

Lint: Pre-merge checks: clang-format: please reformat the code ``` - StartOfFile += Name.size() + Name.size()%2; +…

}

if (Name.startswith("#1/")) { if (Name.startswith("#1/")) {

DiggerLinUnsubmitted

Not Done

the Name.startswith("#1/")) is not for the Archive::K_AIXBIG. what happen if the AIXBIG archive file which has member name begin with "#/" here ?

DiggerLin: the Name.startswith("#1/")) is not for the Archive::K_AIXBIG. what happen if the AIXBIG archive…

uint64_t NameSize; uint64_t NameSize;

if (Name.substr(3).rtrim(' ').getAsInteger(10, NameSize)) { if (Name.substr(3).rtrim(' ').getAsInteger(10, NameSize)) {

std::string Buf; std::string Buf;

raw_string_ostream OS(Buf); raw_string_ostream OS(Buf);

OS.write_escaped(Name.substr(3).rtrim(' ')); OS.write_escaped(Name.substr(3).rtrim(' '));

OS.flush(); OS.flush();

uint64_t Offset = Start - Parent->getData().data(); uint64_t Offset = Start - Parent->getData().data();

*Err = malformedError("long name length characters after the #1/ are " *Err = malformedError("long name length characters after the #1/ are "

"not all decimal numbers: '" + Buf + "' for " "not all decimal numbers: '" + Buf + "' for "

"archive member header at offset " + "archive member header at offset " +

Twine(Offset)); Twine(Offset));

return; return;

} }

StartOfFile += NameSize; StartOfFile += NameSize;

} }

Expected<uint64_t> Archive::Child::getSize() const { Expected<uint64_t> Archive::Child::getSize() const {

if (Parent->IsThin) if (Parent->IsThin)

return Header.getSize(); return Header->getSize();

return Data.size() - StartOfFile; return Data.size() - StartOfFile;

} }

Expected<uint64_t> Archive::Child::getRawSize() const { Expected<uint64_t> Archive::Child::getRawSize() const {

return Header.getSize(); return Header->getSize();

} }

Expected<bool> Archive::Child::isThinMember() const { Expected<bool> Archive::Child::isThinMember() const {

Expected<StringRef> NameOrErr = Header.getRawName(); Expected<StringRef> NameOrErr = Header->getRawName();

if (!NameOrErr) if (!NameOrErr)

return NameOrErr.takeError(); return NameOrErr.takeError();

StringRef Name = NameOrErr.get(); StringRef Name = NameOrErr.get();

return Parent->IsThin && Name != "/" && Name != "//"; return Parent->IsThin && Name != "/" && Name != "//";

} }

Expected<std::string> Archive::Child::getFullName() const { Expected<std::string> Archive::Child::getFullName() const {

Expected<bool> isThin = isThinMember(); Expected<bool> isThin = isThinMember();

Show All 30 Lines Expected<StringRef> Archive::Child::getBuffer() const {

const std::string &FullName = *FullNameOrErr; const std::string &FullName = *FullNameOrErr;

ErrorOr<std::unique_ptr<MemoryBuffer>> Buf = MemoryBuffer::getFile(FullName); ErrorOr<std::unique_ptr<MemoryBuffer>> Buf = MemoryBuffer::getFile(FullName);

if (std::error_code EC = Buf.getError()) if (std::error_code EC = Buf.getError())

return errorCodeToError(EC); return errorCodeToError(EC);

Parent->ThinBuffers.push_back(std::move(*Buf)); Parent->ThinBuffers.push_back(std::move(*Buf));

return Parent->ThinBuffers.back()->getBuffer(); return Parent->ThinBuffers.back()->getBuffer();

} }

Expected<Archive::Child> Archive::Child::getNext() const { Expected<Archive::Child> Archive::Child::getNext() const {

jhendersonUnsubmitted

Not Done

It seems to me like the logic you've added to this function is a little odd. I think you may be trying too hard to match the existing logic, when it doesn't really make sense. In traditional UNIX ar format, each child is immediately followed by the next one (barring a possible even-alignment padding byte), stopping when you get to the end. My reading of the Big Archive spec is that the next child's offset is defined by the ar_nxtmem member of its header. You should just be using that and fl_lstmoff to identify if the current child is the last one or not. You can then check to see if the child goes past the buffer end, as per the existing check in this file.

jhenderson: It seems to me like the logic you've added to this function is a little odd. I think you may be…

EGuesnetAuthorUnsubmitted

Done

I think you may be trying too hard to match the existing logic

Right.

My reading of the Big Archive spec is that the next child's offset is defined by the ar_nxtmem member of its header. You should just be using that and fl_lstmoff to identify if the current child is the last one or not. You can then check to see if the child goes past the buffer end, as per the existing check in this file.

But using offset needs to know the current location. You say in another comment this his highly not thread-safe and might be avoid.
Moreover, if I use fl_lstmoff instead of Length, I must have an access to Fixed length header. It is currently read and forbidden as length of object in archive in the only information we need.

EGuesnet: > I think you may be trying too hard to match the existing logic Right. > My reading of the…

jhendersonUnsubmitted

Not Done

It is currently read and forbidden as length of object in archive in the only information we need.

I don't understand this comment at all. Do you mean forgotten, not forbidden? Perhaps you could change how things are read and stored?

jhenderson: > It is currently read and forbidden as length of object in archive in the only information we…

size_t SpaceToSkip = Data.size(); size_t SpaceToSkip = Data.size();

// If it's odd, add 1 to make it even. // If it's odd, add 1 to make it even.

if (SpaceToSkip & 1) if (SpaceToSkip & 1)

++SpaceToSkip; ++SpaceToSkip;

const char *NextLoc = Data.data() + SpaceToSkip; const char *NextLoc = Data.data() + SpaceToSkip;

// Update current

CurrentLocation += SpaceToSkip;

// On AIX, stop on Member table

// Check to see if this is at the end of the archive. // Check to see if this is at the end of the archive.

if (NextLoc == Parent->Data.getBufferEnd()) if (NextLoc == Parent->Data.getBufferEnd() ||

(Parent->kind() == K_XCOFF && CurrentLocation + MAGIC_LEN == Parent->Length) )

Lint: Pre-merge checks

clang-format: please reformat the code

-      (Parent->kind() == K_XCOFF && CurrentLocation + MAGIC_LEN == Parent->Length) )
-  {
+      (Parent->kind() == K_XCOFF &&
+       CurrentLocation + MAGIC_LEN == Parent->Length)) {

Lint: Pre-merge checks: clang-format: please reformat the code ``` - (Parent->kind() == K_XCOFF && CurrentLocation…

{

return Child(nullptr, nullptr, nullptr); return Child(nullptr, nullptr, nullptr);

}

jhendersonUnsubmitted

Not Done

Revert this change.

jhenderson: Revert this change.

// Check to see if this is past the end of the archive. // Check to see if this is past the end of the archive.

if (NextLoc > Parent->Data.getBufferEnd()) { if ((NextLoc > Parent->Data.getBufferEnd()) ||

(Parent->kind() == K_XCOFF && Parent->CurrentLocation + MAGIC_LEN > Parent->Length)) {

Lint: Pre-merge checks

clang-format: please reformat the code

-      (Parent->kind() == K_XCOFF && Parent->CurrentLocation + MAGIC_LEN > Parent->Length)) {
+      (Parent->kind() == K_XCOFF &&
+       Parent->CurrentLocation + MAGIC_LEN > Parent->Length)) {

Lint: Pre-merge checks: clang-format: please reformat the code ``` - (Parent->kind() == K_XCOFF && Parent…

std::string Msg("offset to next archive member past the end of the archive " std::string Msg("offset to next archive member past the end of the archive "

"after member "); "after member ");

Expected<StringRef> NameOrErr = getName(); Expected<StringRef> NameOrErr = getName();

if (!NameOrErr) { if (!NameOrErr) {

consumeError(NameOrErr.takeError()); consumeError(NameOrErr.takeError());

uint64_t Offset = Data.data() - Parent->getData().data(); uint64_t Offset = Data.data() - Parent->getData().data();

return malformedError(Msg + "at offset " + Twine(Offset)); return malformedError(Msg + "at offset " + Twine(Offset));

} else } else

Show All 14 Lines uint64_t Archive::Child::getChildOffset() const {

return offset; return offset;

} }

Expected<StringRef> Archive::Child::getName() const { Expected<StringRef> Archive::Child::getName() const {

Expected<uint64_t> RawSizeOrErr = getRawSize(); Expected<uint64_t> RawSizeOrErr = getRawSize();

if (!RawSizeOrErr) if (!RawSizeOrErr)

return RawSizeOrErr.takeError(); return RawSizeOrErr.takeError();

uint64_t RawSize = RawSizeOrErr.get(); uint64_t RawSize = RawSizeOrErr.get();

Expected<StringRef> NameOrErr = Header.getName(Header.getSizeOf() + RawSize); Expected<StringRef> NameOrErr = Header->getName(Header->getSizeOf() + RawSize);

Lint: Pre-merge checks

clang-format: please reformat the code

-  Expected<StringRef> NameOrErr = Header->getName(Header->getSizeOf() + RawSize);
+  Expected<StringRef> NameOrErr =
+      Header->getName(Header->getSizeOf() + RawSize);

Lint: Pre-merge checks: clang-format: please reformat the code ``` - Expected<StringRef> NameOrErr = Header->getName…

if (!NameOrErr) if (!NameOrErr)

jhendersonUnsubmitted

Not Done

This looks like an unrelated formatting change.

jhenderson: This looks like an unrelated formatting change.

return NameOrErr.takeError(); return NameOrErr.takeError();

StringRef Name = NameOrErr.get(); StringRef Name = NameOrErr.get();

return Name; return Name;

} }

Expected<MemoryBufferRef> Archive::Child::getMemoryBufferRef() const { Expected<MemoryBufferRef> Archive::Child::getMemoryBufferRef() const {

Expected<StringRef> NameOrErr = getName(); Expected<StringRef> NameOrErr = getName();

if (!NameOrErr) if (!NameOrErr)

Show All 34 Lines Archive::Archive(MemoryBufferRef Source, Error &Err)

: Binary(Binary::ID_Archive, Source) { : Binary(Binary::ID_Archive, Source) {

ErrorAsOutParameter ErrAsOutParam(&Err); ErrorAsOutParameter ErrAsOutParam(&Err);

StringRef Buffer = Data.getBuffer(); StringRef Buffer = Data.getBuffer();

// Check for sufficient magic. // Check for sufficient magic.

if (Buffer.startswith(ThinMagic)) { if (Buffer.startswith(ThinMagic)) {

IsThin = true; IsThin = true;

} else if (Buffer.startswith(Magic)) { } else if (Buffer.startswith(Magic)) {

IsThin = false; IsThin = false;

} else if (Buffer.startswith(BigMagic)) {

IsThin = false;

} else { } else {

Err = make_error<GenericBinaryError>("file too small to be an archive", Err = make_error<GenericBinaryError>("file too small to be an archive",

object_error::invalid_file_type); object_error::invalid_file_type);

return; return;

} }

// Make sure Format is initialized before any call to // Make sure Format is initialized before any call to

// ArchiveMemberHeader::getName() is made. This could be a valid empty // ArchiveMemberHeader::getName() is made. This could be a valid empty

// archive which is the same in all formats. So claiming it to be gnu to is // archive which is the same in all formats. So claiming it to be gnu to is

// fine if not totally correct before we look for a string table or table of // fine if not totally correct before we look for a string table or table of

// contents. // contents.

jhendersonUnsubmitted

Done

This comment implies an empty archive is the same across all formats, but it looks like that's not the case for the Big Archive format. As I understand from my reading, we can't iterate over children to see if the archive is truly empty using the GNU format, for an AIX archive, so the comment needs updating.

jhenderson: This comment implies an empty archive is the same across all formats, but it looks like that's…

if (Buffer.startswith(BigMagic)) {

Format = K_XCOFF;

} else

Format = K_GNU; Format = K_GNU;

// Get the special members. // Get the special members.

child_iterator I = child_begin(Err, false); child_iterator I = child_begin(Err, false);

if (Err) if (Err)

return; return;

child_iterator E = child_end(); child_iterator E = child_end();

// See if this is a valid empty archive and if so return. // See if this is a valid empty archive and if so return.

Show All 13 Lines Archive::Archive(MemoryBufferRef Source, Error &Err)

Expected<StringRef> NameOrErr = C->getRawName(); Expected<StringRef> NameOrErr = C->getRawName();

if (!NameOrErr) { if (!NameOrErr) {

Err = NameOrErr.takeError(); Err = NameOrErr.takeError();

return; return;

} }

StringRef Name = NameOrErr.get(); StringRef Name = NameOrErr.get();

// AIX big archive is totally different that all other.

if (Buffer.startswith(BigMagic)) {

jhendersonUnsubmitted

Done

I'd move this logic below the following comment, since although the process is different, it is still determining the archive format.

You can simply add a note saying something like:

// AIX Big archive format
//   Identified purely by magic bytes and uses a unique format.

jhenderson: I'd move this logic below the following comment, since although the process is different, it is…

Format = K_XCOFF;

// Length of archive (all object file + header)

// is offset to member table, located at 8->27.

jhendersonUnsubmitted

Done

Don't reflow your comments before the 80-character column width. Let clang-format do that for you.

I'm not really sure what this comment is trying to say, if I'm honest.

jhenderson: Don't reflow your comments before the 80-character column width. Let clang-format do that for…

Buffer.substr(8, 20).rtrim(" ").getAsInteger(10, Length);

setFirstRegular(*C);

Err = Error::success();

return;

}

// Below is the pattern that is used to figure out the archive format // Below is the pattern that is used to figure out the archive format

// GNU archive format // GNU archive format

// First member : / (may exist, if it exists, points to the symbol table ) // First member : / (may exist, if it exists, points to the symbol table )

// Second member : // (may exist, if it exists, points to the string table) // Second member : // (may exist, if it exists, points to the string table)

// Note : The string table is used if the filename exceeds 15 characters // Note : The string table is used if the filename exceeds 15 characters

// BSD archive format // BSD archive format

// First member : __.SYMDEF or "__.SYMDEF SORTED" (the symbol table) // First member : __.SYMDEF or "__.SYMDEF SORTED" (the symbol table)

// There is no string table, if the filename exceeds 15 characters or has a // There is no string table, if the filename exceeds 15 characters or has a

▲ Show 20 Lines • Show All 185 Lines • ▼ Show 20 Lines Archive::child_iterator Archive::child_begin(Error &Err,

if (Err) if (Err)

return child_end(); return child_end();

return child_iterator::itr(C, Err); return child_iterator::itr(C, Err);

} }

Archive::child_iterator Archive::child_end() const { Archive::child_iterator Archive::child_end() const {

return child_iterator::end(Child(nullptr, nullptr, nullptr)); return child_iterator::end(Child(nullptr, nullptr, nullptr));

} }

EsmeUnsubmitted

Not Done

It's not correct to calculate the offset of the first archive member by Data.getBufferStart() + strlen(Magic) + sizeof(Archive::ArFixLenHdrType);, please use the value of ArFixLenHdr->FirstArOffset.
You can easily reproduce the bug if you test a archive file which contains a object file member like the comment I added before:

$ xlc 1.c -o 1.o
$ ar -v -q 1.a 1.o
$ llvm-ar tv 1.a
llvm-ar: error: unable to load '1.a': truncated or malformed archive (characters in size field in archive header are not all decimal numbers: '\000\000\000\000\000\000\000\000\000\0005892' for archive member header at offset 128)

The correct offset for this case should be 138 instead of 128.

Esme: It's not correct to calculate the offset of the first archive member by `Data.getBufferStart()…

EGuesnetAuthorUnsubmitted

Done

First, LLVM is no more a priority of our team, so I cannot spend time to this PR.
Second, I am not able to reproduce. Please give me content of 1.c, and content of 1.a.

// small.c
int add_two (int a) {
        return a + 2;
}

$ xlc -v
C for AIX Compiler, Version 5

OK, really old, but we don't care for archive.

$ xlc small.c  -o small.o -c
$ ar -v -q small.a small.o

$ xxd small.a

00000000: 3c62 6967 6166 3e0a 3130 3730 2020 2020  <bigaf>.1070
00000010: 2020 2020 2020 2020 2020 2020 3132 3332              1232
00000020: 2020 2020 2020 2020 2020 2020 2020 2020
00000030: 3020 2020 2020 2020 2020 2020 2020 2020  0
00000040: 2020 2020 3132 3820 2020 2020 2020 2020      128
00000050: 2020 2020 2020 2020 3132 3820 2020 2020          128
00000060: 2020 2020 2020 2020 2020 2020 3020 2020              0
00000070: 2020 2020 2020 2020 2020 2020 2020 2020
// End of Fixed-Length Header: size is 128
00000080: 3832 3020 2020 2020 2020 2020 2020 2020  820
00000090: 2020 2020 3130 3730 2020 2020 2020 2020      1070
000000a0: 2020 2020 2020 2020 3020 2020 2020 2020          0
000000b0: 2020 2020 2020 2020 2020 2020 3136 3331              1631
000000c0: 3630 3435 3039 2020 3020 2020 2020 2020  604509  0
000000d0: 2020 2020 3020 2020 2020 2020 2020 2020      0
000000e0: 3634 3420 2020 2020 2020 2020 3720 2020  644         7
[...]

$ llvm-ar --version
LLVM version 13.0.0

$ llvm-ar t small.a
small.o # OK

Third, according documentation https://www.ibm.com/docs/en/aix/7.2?topic=formats-ar-file-format-big , Fixed-Length Header must have a size of 128. 138 is not valide.

EGuesnet: First, LLVM is no more a priority of our team, so I cannot spend time to this PR. Second, I am…

EsmeUnsubmitted

Not Done

Thanks for your reply!

If you are no longer following up on this patch, I'd be happy to continue your work. How do you think?

In my previous comment, a binary member is added to the archive, while in your case, there is an object member (compiled with -c option) in archive.

You can also reproduce the issue if you add a dynamic lib to an archive.

$ cat 1.c
int main() {
return 1;
}
$ xlc 1.c -qmkshrobj -o libt.so
$ ar -v -q 1.a libt.so
$ xxd 1.a 
00000000: 3c62 6967 6166 3e0a 3134 3532 2020 2020  <bigaf>.1452    
00000010: 2020 2020 2020 2020 2020 2020 3136 3134              1614
00000020: 2020 2020 2020 2020 2020 2020 2020 2020                  
00000030: 3020 2020 2020 2020 2020 2020 2020 2020  0               
00000040: 2020 2020 3133 3420 2020 2020 2020 2020      134         
00000050: 2020 2020 2020 2020 3133 3420 2020 2020          134     
00000060: 2020 2020 2020 2020 2020 2020 3020 2020              0   
00000070: 2020 2020 2020 2020 2020 2020 2020 2020                  
00000080: 0000 0000 0000 3131 3935 2020 2020 2020  ......1195    
// There are some padding between the Fixed-Length Header and the First member.  
00000090: 2020 2020 2020 2020 2020 3134 3532 2020            1452  
000000a0: 2020 2020 2020 2020 2020 2020 2020 3020                0 
000000b0: 2020 2020 2020 2020 2020 2020 2020 2020

It's correct that Fixed-Length Header must have a size of 128. 138 is not valide. As above, there may be some padding between the Fixed-Length Header and the first member. So I think the value of ArFixLenHdr->FirstArOffset is the exact offset to the first member.

Esme: Thanks for your reply! 1. If you are no longer following up on this patch, I'd be happy to…

EGuesnetAuthorUnsubmitted

Done

Using ArFixLenHdr->FirstArOffset does not work.

truncated or malformed archive (characters in name length field in archive header are not all decimal numbers: '' for archive member header at offset 68)

I do not know where this "68" is from.

In my opinion, this PR is complex enough. It might be accepted with as few changes as possible. So, without freelist, undocumented features... In a second time, you can create a new PR, to extend support of Big Archive, with new tests.

EGuesnet: Using ArFixLenHdr->FirstArOffset does not work. ``` truncated or malformed archive (characters…

EsmeUnsubmitted

Not Done

OK, thanks!
Do you get the FirstArOffset value via StringRef(ArFixLenHdr->FirstArOffset, sizeof(ArFixLenHdr->FirstArOffset)).rtrim(' ').getAsInteger(10, Size) ?

Esme: OK, thanks! Do you get the FirstArOffset value via `StringRef(ArFixLenHdr->FirstArOffset…

EGuesnetAuthorUnsubmitted

Done

My mistake, you are right. I take FirstArOffset directly...
But in order to treat the padded case, you must add test, with binary file. LLVM community want to avoid adding binary to test. So, I think again a separate PR after this one is preferable.

EGuesnet: My mistake, you are right. I take FirstArOffset directly... But in order to treat the padded…

StringRef Archive::Symbol::getName() const { StringRef Archive::Symbol::getName() const {

return Parent->getSymbolTable().begin() + StringIndex; return Parent->getSymbolTable().begin() + StringIndex;

} }

Expected<Archive::Child> Archive::Symbol::getMember() const { Expected<Archive::Child> Archive::Symbol::getMember() const {

jhendersonUnsubmitted

Done

Do you actually need to change this method in the first instance? You don't have any testing for symbol printing (and I don't think you should at this point...).

jhenderson: Do you actually need to change this method in the first instance? You don't have any testing…

EGuesnetAuthorUnsubmitted

Done

I have removed code related to symbol read.

EGuesnet: I have removed code related to symbol read.

const char *Buf = Parent->getSymbolTable().begin(); const char *Buf = Parent->getSymbolTable().begin();

const char *Offsets = Buf; const char *Offsets = Buf;

if (Parent->kind() == K_GNU64 || Parent->kind() == K_DARWIN64) if (Parent->kind() == K_GNU64 || Parent->kind() == K_DARWIN64)

Offsets += sizeof(uint64_t); Offsets += sizeof(uint64_t);

else if (Parent->kind() == K_XCOFF)

Offsets += 20;

// Each offset is 20 bytes long

Lint: Pre-merge checks

clang-format: please reformat the code

-    // Each offset is 20 bytes long
+  // Each offset is 20 bytes long

Lint: Pre-merge checks: clang-format: please reformat the code ``` - // Each offset is 20 bytes long + // Each…

else else

jhendersonUnsubmitted

Done

Offsets += sizeof(uint64_t);

- else if (Parent->kind() == K_XCOFF)

+ else if (Parent->kind() == K_XCOFF) {

+ // Each offset is 20 bytes long.

Offsets += 20;

- // Each offset is 20 bytes long

- else

+ } else

Offsets += sizeof(uint32_t);

jhenderson:

Offsets += sizeof(uint32_t); Offsets += sizeof(uint32_t);

uint64_t Offset = 0; uint64_t Offset = 0;

if (Parent->kind() == K_GNU) { if (Parent->kind() == K_GNU) {

Offset = read32be(Offsets + SymbolIndex * 4); Offset = read32be(Offsets + SymbolIndex * 4);

} else if (Parent->kind() == K_GNU64) { } else if (Parent->kind() == K_GNU64) {

Offset = read64be(Offsets + SymbolIndex * 8); Offset = read64be(Offsets + SymbolIndex * 8);

} else if (Parent->kind() == K_BSD) { } else if (Parent->kind() == K_BSD) {

// The SymbolIndex is an index into the ranlib structs that start at // The SymbolIndex is an index into the ranlib structs that start at

// Offsets (the first uint32_t is the number of bytes of the ranlib // Offsets (the first uint32_t is the number of bytes of the ranlib

// structs). The ranlib structs are a pair of uint32_t's the first // structs). The ranlib structs are a pair of uint32_t's the first

// being a string table offset and the second being the offset into // being a string table offset and the second being the offset into

// the archive of the member that defines the symbol. Which is what // the archive of the member that defines the symbol. Which is what

// is needed here. // is needed here.

Offset = read32le(Offsets + SymbolIndex * 8 + 4); Offset = read32le(Offsets + SymbolIndex * 8 + 4);

} else if (Parent->kind() == K_DARWIN64) { } else if (Parent->kind() == K_DARWIN64) {

// The SymbolIndex is an index into the ranlib_64 structs that start at // The SymbolIndex is an index into the ranlib_64 structs that start at

// Offsets (the first uint64_t is the number of bytes of the ranlib_64 // Offsets (the first uint64_t is the number of bytes of the ranlib_64

// structs). The ranlib_64 structs are a pair of uint64_t's the first // structs). The ranlib_64 structs are a pair of uint64_t's the first

// being a string table offset and the second being the offset into // being a string table offset and the second being the offset into

// the archive of the member that defines the symbol. Which is what // the archive of the member that defines the symbol. Which is what

// is needed here. // is needed here.

Offset = read64le(Offsets + SymbolIndex * 16 + 8); Offset = read64le(Offsets + SymbolIndex * 16 + 8);

} else if (Parent->kind() == K_XCOFF) {

Offset = read64be(Offsets + (SymbolIndex + 1) * 20);

} else { } else {

// Skip offsets. // Skip offsets.

uint32_t MemberCount = read32le(Buf); uint32_t MemberCount = read32le(Buf);

Buf += MemberCount * 4 + 4; Buf += MemberCount * 4 + 4;

uint32_t SymbolCount = read32le(Buf); uint32_t SymbolCount = read32le(Buf);

if (SymbolIndex >= SymbolCount) if (SymbolIndex >= SymbolCount)

return errorCodeToError(object_error::parse_failed); return errorCodeToError(object_error::parse_failed);

▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines if (kind() == K_GNU) {

ranlib_count = read64le(buf) / 16; ranlib_count = read64le(buf) / 16;

const char *ranlibs = buf + 8; const char *ranlibs = buf + 8;

uint64_t ran_strx = 0; uint64_t ran_strx = 0;

ran_strx = read64le(ranlibs); ran_strx = read64le(ranlibs);

buf += sizeof(uint64_t) + (ranlib_count * (2 * (sizeof(uint64_t)))); buf += sizeof(uint64_t) + (ranlib_count * (2 * (sizeof(uint64_t))));

// Skip the byte count of the string table. // Skip the byte count of the string table.

buf += sizeof(uint64_t); buf += sizeof(uint64_t);

buf += ran_strx; buf += ran_strx;

} else if (kind() == K_XCOFF) {

uint64_t symbol_count = read64be(buf);

Lint: Pre-merge checks

clang-tidy: warning: invalid case style for variable 'symbol_count' [readability-identifier-naming]
not useful

Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'symbol_count' [readability-identifier…

buf += sizeof(uint64_t) + (symbol_count * (sizeof(uint64_t)));

} else { } else {

uint32_t member_count = 0; uint32_t member_count = 0;

uint32_t symbol_count = 0; uint32_t symbol_count = 0;

member_count = read32le(buf); member_count = read32le(buf);

buf += 4 + (member_count * 4); // Skip offsets. buf += 4 + (member_count * 4); // Skip offsets.

symbol_count = read32le(buf); symbol_count = read32le(buf);

buf += 4 + (symbol_count * 2); // Skip indices. buf += 4 + (symbol_count * 2); // Skip indices.

} }

Show All 12 Lines uint32_t Archive::getNumberOfSymbols() const {

if (kind() == K_GNU) if (kind() == K_GNU)

return read32be(buf); return read32be(buf);

if (kind() == K_GNU64) if (kind() == K_GNU64)

return read64be(buf); return read64be(buf);

if (kind() == K_BSD) if (kind() == K_BSD)

return read32le(buf) / 8; return read32le(buf) / 8;

if (kind() == K_DARWIN64) if (kind() == K_DARWIN64)

return read64le(buf) / 16; return read64le(buf) / 16;

if (kind() == K_XCOFF)

return read64be(buf);

uint32_t member_count = 0; uint32_t member_count = 0;

member_count = read32le(buf); member_count = read32le(buf);

buf += 4 + (member_count * 4); // Skip offsets. buf += 4 + (member_count * 4); // Skip offsets.

return read32le(buf); return read32le(buf);

} }

Expected<Optional<Archive::Child>> Archive::findSym(StringRef name) const { Expected<Optional<Archive::Child>> Archive::findSym(StringRef name) const {

Archive::symbol_iterator bs = symbol_begin(); Archive::symbol_iterator bs = symbol_begin();

Show All 18 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AIX] Support of Big archive (read)Needs RevisionPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 338117

llvm/include/llvm/Object/Archive.h

llvm/lib/Object/Archive.cpp

[AIX] Support of Big archive (read)
Needs RevisionPublic