Add support of AIX Big Archive. Read only.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Deal correctly with memory (destructor, operator=), add test. No regression found.
It is OK for review in point of view.
Added a few others that review binutils. Thanks for the patch!
Looks like the Windows pre-merge check is failing: https://reviews.llvm.org/harbormaster/unit/view/554528/. I can reproduce locally with ninja check-llvm-object. It looks like the archive is empty? Maybe try regenerating it and uploading the patch again.
$ file llvm/test/Object/Inputs/Big.a llvm/test/Object/Inputs/Big.a: empty
While building this, there's a warning about an unhandled switch:
/home/rupprecht/src/llvm-project/llvm/lib/Object/ArchiveWriter.cpp:132:11: warning: enumeration value 'K_XCOFF' not handled in switch [-Wswitch] switch (Kind) { ^ /home/rupprecht/src/llvm-project/llvm/lib/Object/ArchiveWriter.cpp:197:11: warning: enumeration value 'K_XCOFF' not handled in switch [-Wswitch] switch (Kind) { ^
(Didn't get around to reviewing the main source changes in depth yet...)
llvm/test/Object/archive-big-read.test | ||
---|---|---|
1 | Instead of cd'ing to the source tree, these commands should be executed in the build tree -- i.e. drop this line, and change below to llvm-ar tv %p/Inputs/Big.a | |
3 | nit: it isn't technically required, but this should start with a comment (i.e. # Test reading ...) to separate it more clearly from run/check commands |
I've not got the time to review this properly currently.
Of the brief glance I gave this so far:
- There doesn't seem to be as much testing as I'd expect given the amount of new code.
- Using a prebuilt binary should generally be avoided. You should create the archive at test run time, by using llvm-ar or some other tool (e.g. expanding yaml2obj to support this file format).
- I'm not a fan of the complication of the existing archive header code by converting it into a virtual hierarchy. That being said, I haven't looked at the rest of the code properly to give any thought as to whether there's a better approach.
Finally, what's the use-case? Why do you need this functionality?
First, sorry for the noise, I do not master arc yet.
I have updated the test file according comments.
Looks like the Windows pre-merge check is failing: https://reviews.llvm.org/harbormaster/unit/view/554528/. I can reproduce locally with ninja check-llvm-object. It looks like the archive is empty? Maybe try regenerating it and uploading the patch again.
I have updated it correctly. Thanks.
While building this, there's a warning about an unhandled switch:
It occurs on ArchiveWriter.cpp. I implement a way to write Big Archive. If you think it is better, I can add some code to remove warning, but my goal is to implement read and write for Big Archive, so it will be corrected in a future commit / PR.
There doesn't seem to be as much testing as I'd expect given the amount of new code.
Read archive has list (t), extract (x) and print (p) operation. x and p are the same code behind. I test t and p. What do you want to add?
Using a prebuilt binary should generally be avoided. You should create the archive at test run time, by using llvm-ar or some other tool (e.g. expanding yaml2obj to support this file format).
I agree it should be avoided. Unfortunately, we cannot create archive at run time, as the write operation is not implemented yet (I work on it). Moreover, yaml2obj has not been ported on AIX (XCOFF object format). In future, when write operation will be implement for Big Archive, binary should be delete.
I'm not a fan of the complication of the existing archive header code by converting it into a virtual hierarchy. That being said, I haven't looked at the rest of the code properly to give any thought as to whether there's a better approach.
We must deal with two struct with different size (ArMemHd). I have not found another clean way to deal with. Moreover, Big Archive are quite a lot different than other archive. Terminator is cosmetic, Fix Size Header is different than Object Header, way to deal with object name is totally different and so on...
Finally, what's the use-case? Why do you need this functionality?
I want to port Rust on AIX. Rust needs archive and it uses llvm library to deal with it. It is not possible (or really hard) to use an external tool, like system ar. So, I must implement read and write support for Big Archive (AIX archive). This PR is the first part: read Big Archive.
As far as I know, it will also be useful to port Clang on AIX, as it will permit to port the CreateExportLists tool. This tool is used to export symbol, a required step to produce correctly binaries on AIX.
Updating D100651: [AIX] Support of Big archive (read)
Previous version made the assumption libLLVM was loaded once per archive read. It was OK with llvm-ar and all check I have performed, but it is wrong with rust: rust loads libLLVM, and then loads all libraries.
I have corrected this by removing a static variable and by changing way llvm checks if it is the Fixed Size Header or not.
[AIX] Support of Big archive (read)
One line change to decrease usage of CurrentLocation static variable.
While building this, there's a warning about an unhandled switch:
It occurs on ArchiveWriter.cpp. I implement a way to write Big Archive. If you think it is better, I can add some code to remove warning, but my goal is to implement read and write for Big Archive, so it will be corrected in a future commit / PR.
Many people build with -Werror, so it would be good to handle the case even if it's explicitly ignored, like:
case K_XCOFF: llvm_unreachable("Not handled yet");
Using a prebuilt binary should generally be avoided. You should create the archive at test run time, by using llvm-ar or some other tool (e.g. expanding yaml2obj to support this file format).
I agree it should be avoided. Unfortunately, we cannot create archive at run time, as the write operation is not implemented yet (I work on it). Moreover, yaml2obj has not been ported on AIX (XCOFF object format). In future, when write operation will be implement for Big Archive, binary should be delete.
Generally I agree that prebuilt binaries should be avoided for testing (mostly due to being opaque), but there's also the case that we should have at least one prebuilt binary that we use to avoid symmetric bugs (imagine a bug in a bad archive being created being missed because the archive reader was also changed to read it incorrectly).
llvm/include/llvm/Object/Archive.h | ||
---|---|---|
41 ↗ | (On Diff #347942) | The clone() method should return a std::unique_ptr<AbstractArchiveMemberHeader> to avoid raw memory management |
231 ↗ | (On Diff #347942) | There's a lot of misc formatting changes in this patch. I formatted everything in e41aaea26238d0a5cb19163863819786e24f0e02, so if you rebase past that, the diff will be cleaner. |
391 ↗ | (On Diff #347942) | Does this need to be static? e.g. this could break if any tool needed to read two separate big archives |
I have modified ArchiveWriter.cpp to avoid the Warning.
I used now an unique_ptr. Code was updated (and simplified) as a consequence. Changes are mainly in Child declaration in Archive.h and Child constructors in Archive.cpp.
llvm/include/llvm/Object/Archive.h | ||
---|---|---|
391 ↗ | (On Diff #347942) | As far as I know, static variable is needed. Location must be modified by Child and must be stored in Parent. As Child cannot modify Parent, it is the easiest way to deal with. |
Hi, @rupprecht, @jhenderson,
Any news?
It will be better if this PR (and D104367) can be integrated in next LLVM version, and the merge windows will be closed soon.
Thanks.
I was leaving @rupprecht to continue reviewing this, but will try to find time to take a look further in the coming days if he doesn't.
Okay, I've spent a bit of time doing some more reviewing here.
Some high-level comments:
- You should get some people who are more familiar with XCOFF/AIX to help review correctness. There has been a lot of development for this in other tools (e.g. llvm-readobj).
- Update the review summary to be the body of the proposed commit message.
- The "Big Archive" format seems to be an AIX-specific thing. I wonder if we should refer to it as the "AIX Big Archive" format everywhere, to avoid confusion. See also inline comments.
- I think you should be using the archive member headers directly more often.
I haven't had time to fully go through this review, but this should get you started.
llvm/include/llvm/Object/Archive.h | ||
---|---|---|
337 ↗ | (On Diff #348513) | Archives can generally hold files of any format, including non-object files like text files. It seems to me the corect name should be K_AIXBIG. What do you think? |
llvm/lib/Object/Archive.cpp | ||
763–766 ↗ | (On Diff #348513) | This comment implies an empty archive is the same across all formats, but it looks like that's not the case for the Big Archive format. As I understand from my reading, we can't iterate over children to see if the archive is truly empty using the GNU format, for an AIX archive, so the comment needs updating. |
801 ↗ | (On Diff #348513) | I'd move this logic below the following comment, since although the process is different, it is still determining the archive format. You can simply add a note saying something like: // AIX Big archive format // Identified purely by magic bytes and uses a unique format. |
803–804 ↗ | (On Diff #348513) | Don't reflow your comments before the 80-character column width. Let clang-format do that for you. I'm not really sure what this comment is trying to say, if I'm honest. |
1016 ↗ | (On Diff #348513) | Do you actually need to change this method in the first instance? You don't have any testing for symbol printing (and I don't think you should at this point...). |
1021–1024 ↗ | (On Diff #348513) | |
llvm/lib/Object/ArchiveWriter.cpp | ||
141 ↗ | (On Diff #348513) | Make sure to clang-format all your modified code. I'd actually use report_fatal_error() rather than llvm_unreachable(), if the code could be reached by a tool receiving a file of this wrong kind, with the corresponding options. If it is genuinely unreachable of course, there is no need to change. Same below. |
llvm/test/Object/archive-big-read.test | ||
1 | Two related nits:
Also, comments should end with a full stop ('.') - applies below too. | |
2 | I'd rename the binary to bigFormat.a to avoid confusing it with a large regular archive. Also, a personal preference is to use double dash for long options (i.e. --strict-whitespace). Also, I'd add --implicit-check-not={{.}} to the FileCheck line. Without it the following output would be successfully matched against, because FileCheck matches sub-strings: rw-r--r-- 0/0 8 Apr 21 14:12 2021 evenlendsabdjasbababfjbasjk It also ensures there's no output before the first line or after the last checked line. | |
7 | I think we should test each operation in isolation. In other words, test t in a separate file to p. One is about reading the archive member list, the other about the archive member contents, so although the latter relies on the former, the inverse isn't true. | |
8 | I'm assuming the file contents are "evenlen". I'd actually be inclined to change the contents to be different to the member name. That way, it prevents a bug where the operation is returning the wrong thing. I think you should also add to the FileCheck line --implicit-check-not={{.}}. This will ensure that there is no other output printed, as by default FileCheck looks for sub-strings. |
llvm/include/llvm/Object/Archive.h | ||
---|---|---|
371–373 ↗ | (On Diff #348513) | Do you really need this length? Why is it uint32_t? |
391 ↗ | (On Diff #347942) | This is completely unthread safe - if you have a tool parsing two archives simultaneously, this will fail horribly. Do you really need it though? See my comments further down. From reading the spec, the archive uses offsets everywhere and it should be simple to use the values stored in the member headers. |
llvm/lib/Object/Archive.cpp | ||
484 ↗ | (On Diff #348513) | Don't add this comment. We can see that these are child constructors by the function signature. |
490 ↗ | (On Diff #348513) | Use std::make_unique to create a unique pointer without needing to explicitly mention new. Same everywhere else you've done this. |
499 ↗ | (On Diff #348513) | Don't add a blank line at the function start. |
648 ↗ | (On Diff #348513) | It seems to me like the logic you've added to this function is a little odd. I think you may be trying too hard to match the existing logic, when it doesn't really make sense. In traditional UNIX ar format, each child is immediately followed by the next one (barring a possible even-alignment padding byte), stopping when you get to the end. My reading of the Big Archive spec is that the next child's offset is defined by the ar_nxtmem member of its header. You should just be using that and fl_lstmoff to identify if the current child is the last one or not. You can then check to see if the child goes past the buffer end, as per the existing check in this file. |
You should get some people who are more familiar with XCOFF/AIX to help review correctness. There has been a lot of development for this in other tools (e.g. llvm-readobj).
The IBM team that has sent PR about llvm-readobj is aware of this PR. I will recontact them.
I think I have respond to most of your comment. The main issue is the static variable. I need to know the current location to distinguish Fix length header and Object headers, and I have not found cleaner solution.
Note that it has been checked about multiple read: Rust loads libLLVM.so once, and can read multiple archive. It works, as the static variable is initialized correctly before being used. But it is sequential read.
llvm/include/llvm/Object/Archive.h | ||
---|---|---|
371–373 ↗ | (On Diff #348513) | Maximal offset (in fixed length header) is stored as 20 bytes as decimal digit. So, maximal length is approximately 10^10. It is the total length of the archive. So, it must be uint64-t, not uint32_t. |
391 ↗ | (On Diff #347942) | Main trouble is with first read, to distinguish Fixed length header and other one. |
llvm/lib/Object/Archive.cpp | ||
648 ↗ | (On Diff #348513) |
Right.
But using offset needs to know the current location. You say in another comment this his highly not thread-safe and might be avoid. |
1016 ↗ | (On Diff #348513) | I have removed code related to symbol read. |
llvm/lib/Object/ArchiveWriter.cpp | ||
141 ↗ | (On Diff #348513) | It can be reached by external tools (not llvm-ar), so I modifie it to use report_fatal_error(). |
llvm/test/Object/archive-big-read.test | ||
1 | I think it is OK now. | |
3 | Done. See archive-big-extract.test. |
I need to know the current location to distinguish Fix length header and Object headers, and I have not found cleaner solution.
I still don't understand why you need to know it to make this distinction. The two header types are completely different things as far as I understand the spec, used in different contexts. Could you read the Fixed Length header as the first part of reading the archive, and the object headers when reading individual children, then store them as members of the respective classes?
Note that it has been checked about multiple read: Rust loads libLLVM.so once, and can read multiple archive. It works, as the static variable is initialized correctly before being used. But it is sequential read.
Just because this works in one single case does not make this an appropriate design - in fact you haven't even tested the case where the issue is, as I bet that the thing you ran read the archives sequentially, not in parallel. What if I wrote a tool that wanted to read multiple Big-format archives in parallel, with one thread per archive? What would happen to the static variable on each individual thread?
Pseudo-code example reading algorithm that doesn't have the issues mentioned:
class Archive { FixedLengthHeader Header; std::vector<Child> Children; ArrayRef<uint8_t> Data; Archive(ArrayRef<uint8_t> Data) : Data(Data) { Header = reinterpret_cast<FixedLengthHeader> (Data.data()); uint64_t Offset = Header->FirstChildOffset; // Check if archive is empty. Child C(Offset, this); while (C != nullptr) { Children.push_back(C); C = C.nextChild(); } } }; class Child { ObjectHeader Header; uint8_t *Payload; Archive &Parent; Child(uint64_t Offset, Archive &Archive) { // Check Archive data is large enough for offset + header size + payload size. Parent = Archive; Header = reinterpret_cast<ObjectHeader>(Parent->Data.data() + Offset); Payload = Parent->Data.data() + Offset + sizeof(Header); } Child *nextChild() { if (Header->nextOffset == 0) return nullptr; return Child(Header->nextOffset, Parent); } };
llvm/lib/Object/Archive.cpp | ||
---|---|---|
648 ↗ | (On Diff #348513) |
I don't understand this comment at all. Do you mean forgotten, not forbidden? Perhaps you could change how things are read and stored? |
llvm/test/Object/archive-big-read.test | ||
1 | Older tests use older styles. We have developed better styles in the meantime which are preferred for newer tests. |
Hi @jhenderson,
I cannot follow exactly what you propose, because llvm-ar (and maybe other tools) uses directly child_iterator. As this object does not have access to Archive (=Parent), it seems difficult to transfert Child vector of Archive to child_iterator. Moreover, I want to avoid to un-const Parent in Child.
Using the same idea, but not the same exact way, I propose:
- Move FixedLengthHeader on Archive, and use it for Big Archive only.
- Deal with first read (Fixed Length Header) in Archive creation, so any Child after do not care about it (one of the reason of static variable in previous version).
- To distinguish Object and Member Table, I cannot use Next Member in Object Header, because it points to Member Table in the last Object; I cannot use Offset to Member Table of Fixed Length Header, because I do not memorize the current position. So, I use a trick: an Object must have a name (adding empty-file to an archive is non valid), and Member Table Header does not have name, so name field is "`\n" (terminator).
- The child_begin_bigarchive() function is equivalent to child_begin(), but it deals correctly with Fixed Length Header.
getSize() and Big Archive Header initialiser are simplified, differences between usual Archive and Big Archive is deal during Child and Archive initialisation (so, correctly separated).
Free List is not yet supported: if you remove object in Archive, it will become unreadable by LLVM. Free List is complex, and not fully documented. I use report_fatal_error() if Free List is present in Archive (so, not 0), or if it is impossible to read it.
llvm/test/Object/archive-big-extract.test | ||
---|---|---|
1 ↗ | (On Diff #361218) | Can you please add the command of creating the ar file to the comment? Base on this patch, I am trying to enable tools (e.g. llvm-readobj) for ar format, but I am getting an error from BigArchiveMemberHeader::getSize() when applying this patch. # Create the ar file under AIX $ ar -v -q 1.a 1.o $ llvm-ar p 1.a evenlen llvm-ar: error: unable to load '2.a': truncated or malformed archive (characters in size field in archive header are not all decimal numbers: '\000\000\000\000\000\000\000\000\000\0005892' for archive member header at offset 128) |
llvm/lib/Object/Archive.cpp | ||
---|---|---|
1013 ↗ | (On Diff #361218) | It's not correct to calculate the offset of the first archive member by Data.getBufferStart() + strlen(Magic) + sizeof(Archive::ArFixLenHdrType);, please use the value of ArFixLenHdr->FirstArOffset. $ xlc 1.c -o 1.o $ ar -v -q 1.a 1.o $ llvm-ar tv 1.a llvm-ar: error: unable to load '1.a': truncated or malformed archive (characters in size field in archive header are not all decimal numbers: '\000\000\000\000\000\000\000\000\000\0005892' for archive member header at offset 128) The correct offset for this case should be 138 instead of 128. |
llvm/lib/Object/Archive.cpp | ||
---|---|---|
1013 ↗ | (On Diff #361218) | First, LLVM is no more a priority of our team, so I cannot spend time to this PR. // small.c int add_two (int a) { return a + 2; } $ xlc -v C for AIX Compiler, Version 5 OK, really old, but we don't care for archive. $ xlc small.c -o small.o -c $ ar -v -q small.a small.o $ xxd small.a 00000000: 3c62 6967 6166 3e0a 3130 3730 2020 2020 <bigaf>.1070 00000010: 2020 2020 2020 2020 2020 2020 3132 3332 1232 00000020: 2020 2020 2020 2020 2020 2020 2020 2020 00000030: 3020 2020 2020 2020 2020 2020 2020 2020 0 00000040: 2020 2020 3132 3820 2020 2020 2020 2020 128 00000050: 2020 2020 2020 2020 3132 3820 2020 2020 128 00000060: 2020 2020 2020 2020 2020 2020 3020 2020 0 00000070: 2020 2020 2020 2020 2020 2020 2020 2020 // End of Fixed-Length Header: size is 128 00000080: 3832 3020 2020 2020 2020 2020 2020 2020 820 00000090: 2020 2020 3130 3730 2020 2020 2020 2020 1070 000000a0: 2020 2020 2020 2020 3020 2020 2020 2020 0 000000b0: 2020 2020 2020 2020 2020 2020 3136 3331 1631 000000c0: 3630 3435 3039 2020 3020 2020 2020 2020 604509 0 000000d0: 2020 2020 3020 2020 2020 2020 2020 2020 0 000000e0: 3634 3420 2020 2020 2020 2020 3720 2020 644 7 [...] $ llvm-ar --version LLVM version 13.0.0 $ llvm-ar t small.a small.o # OK Third, according documentation https://www.ibm.com/docs/en/aix/7.2?topic=formats-ar-file-format-big , Fixed-Length Header must have a size of 128. 138 is not valide. |
llvm/lib/Object/Archive.cpp | ||
---|---|---|
1013 ↗ | (On Diff #361218) | Thanks for your reply!
You can also reproduce the issue if you add a dynamic lib to an archive. $ cat 1.c int main() { return 1; } $ xlc 1.c -qmkshrobj -o libt.so $ ar -v -q 1.a libt.so $ xxd 1.a 00000000: 3c62 6967 6166 3e0a 3134 3532 2020 2020 <bigaf>.1452 00000010: 2020 2020 2020 2020 2020 2020 3136 3134 1614 00000020: 2020 2020 2020 2020 2020 2020 2020 2020 00000030: 3020 2020 2020 2020 2020 2020 2020 2020 0 00000040: 2020 2020 3133 3420 2020 2020 2020 2020 134 00000050: 2020 2020 2020 2020 3133 3420 2020 2020 134 00000060: 2020 2020 2020 2020 2020 2020 3020 2020 0 00000070: 2020 2020 2020 2020 2020 2020 2020 2020 00000080: 0000 0000 0000 3131 3935 2020 2020 2020 ......1195 // There are some padding between the Fixed-Length Header and the First member. 00000090: 2020 2020 2020 2020 2020 3134 3532 2020 1452 000000a0: 2020 2020 2020 2020 2020 2020 2020 3020 0 000000b0: 2020 2020 2020 2020 2020 2020 2020 2020
|
llvm/lib/Object/Archive.cpp | ||
---|---|---|
1013 ↗ | (On Diff #361218) | Using ArFixLenHdr->FirstArOffset does not work. truncated or malformed archive (characters in name length field in archive header are not all decimal numbers: '' for archive member header at offset 68) I do not know where this "68" is from. In my opinion, this PR is complex enough. It might be accepted with as few changes as possible. So, without freelist, undocumented features... In a second time, you can create a new PR, to extend support of Big Archive, with new tests. |
llvm/lib/Object/Archive.cpp | ||
---|---|---|
1013 ↗ | (On Diff #361218) | OK, thanks! |
llvm/lib/Object/Archive.cpp | ||
---|---|---|
1013 ↗ | (On Diff #361218) | My mistake, you are right. I take FirstArOffset directly... |
llvm/lib/Object/Archive.cpp | ||
---|---|---|
102 ↗ | (On Diff #361218) | There are common source code in the BigArchiveMemberHeader::BigArchiveMemberHeader and ArchiveMemberHeader::ArchiveMemberHeader , we can put the common code in the constructor of AbstractArchiveMemberHeader |
llvm/include/llvm/Object/Archive.h | ||
---|---|---|
330 ↗ | (On Diff #361218) | I do not think we need a special child_begin_bigarchive() API for aix big ar here, |
Hi @DiggerLin,
I do not have time to improve this PR before begin of November. We have other priorities right now.
This PR has nearly 6 months, so, it can wait one more.
llvm/include/llvm/Object/Archive.h | ||
---|---|---|
101 ↗ | (On Diff #361218) | both the ArchiveMemberHeader and the BigArchiveMemberHeader has the "Archive const *Parent;" , can the member put in the AbstractArchiveMemberHeader ? |
llvm/lib/Object/Archive.cpp | ||
60 ↗ | (On Diff #361218) | if the RawHeaderPtr == nullptr is true , the value of ArMemHdr will be random here. |
106 ↗ | (On Diff #361218) | same problem as above. |
163 ↗ | (On Diff #361218) | the variable name "end" can not express the mean, maybe be "NameSize" or "NameLen" ? |
308–320 ↗ | (On Diff #361218) | there are several place use the almost the same code from line 308 to 320 . can we change these lines into a helper function? |
558 ↗ | (On Diff #361218) | change "Name.size() + Name.size() % 2" to ((Name.size()+1) >>1 ) << 1 ? |
561 ↗ | (On Diff #361218) | the Name.startswith("#1/")) is not for the Archive::K_AIXBIG. what happen if the AIXBIG archive file which has member name begin with "#/" here ? |
llvm/lib/Object/Archive.cpp | ||
---|---|---|
269 ↗ | (On Diff #361218) | it looks the parameter Size not be used here? |
llvm/include/llvm/Object/Archive.h | ||
---|---|---|
330 ↗ | (On Diff #361218) | also using a separate interface child_begin_bigarchive , you can not use the API iterator_range<child_iterator> children(Error &Err, bool SkipInternal = true) const { return make_range(child_begin(Err, SkipInternal), child_end()); |
the function file_magic llvm::identify_magic(StringRef Magic) should be modified to add big ar identification. otherwise object::createBinary will be error for big ar.
llvm/include/llvm/Object/Archive.h | ||
---|---|---|
330 ↗ | (On Diff #361218) | or you can call child_begin_bigarchive() in child_begin(Error &Err, bool SkipInternal = true) when Format is K_AIXBIG() |
llvm/lib/Object/Archive.cpp | ||
---|---|---|
308–320 ↗ | (On Diff #361218) | and I just wonder why you need the code std::string Buf; raw_string_ostream OS(Buf); OS.write_escaped( StringRef(ArMemHdr->Size, sizeof(ArMemHdr->Size)).rtrim(" ")); OS.flush(); to convert to std::string. for the StringRef , there is a function str() to convert to std::string, why not use str() of StringRef ? |
can you address the comment ,update the patch? we will approve it and We will create a following patch for the "Support of Big archive" @EGuesnet
In my opinion, this PR is complex enough. It might be accepted with as few changes as possible. So, without freelist, undocumented features... In a second time, you can create a new PR, to extend support of Big Archive, with new tests.
As @EGuesnet mentioned above, I think he won't have time to address the comments in time.
However, we do depend on this patch for AIX in near future,
so I would suggest that we *accept* this patch *as it is* for now, and commit it as @EGuesnet suggested.
And @DiggerLin will post follow up patches immediately.
Is that OK for reviewers? @jhenderson @rupprecht @Esme ?
This patch is not in a suitable state ready for commit. I've highlighted a high number of issues, and I think there are some fundamental issues with the approach, which need addressing before it is acceptable to land this in LLVM. I haven't finished reviewing the whole patch yet either. I'm sorry if you need the patch in the near future. I can only suggest that you try implementing something yourself, perhaps by adopting and improving this patch, or starting an alternative version (inspired by this patch, and with credit given to @EGuesnet for their initial work, in the commit message).
llvm/include/llvm/Object/Archive.h | ||
---|---|---|
40 ↗ | (On Diff #361218) | This comment is superfluous. clone always means this. |
51 ↗ | (On Diff #361218) | getOffset isn't named getRawOffset, so doesn't look like it belong in this block. getRawName on the other hand looks like it does belong here. Please make sure all comments follow the existing style. At the moment, comments are doxygen style (see e.g. getName). LLVM style also uses trailing full stops on comments - please add them where they are missing. That being said, don't bother with comments that provide no useful benefit beyond the method names. For example, you do not need this comment or the "Non-raw getters" comment: what benefit do they provide? |
106 ↗ | (On Diff #361218) | Please order const according to standard LLVM style, i.e. const T *. If you're changing a function anyway, it makes sense to fix the style. |
108 ↗ | (On Diff #361218) | Please separate unrelated method declarations, or methods with non-single-line definitions with a blank line. |
166 ↗ | (On Diff #361218) | Do we actually need all these new constructors and assignment operators? Are they used? |
168 ↗ | (On Diff #361218) | Under what circumstances can C.Header be a nullptr? |
171 ↗ | (On Diff #361218) | Nit: add new line before this method. |
172 ↗ | (On Diff #361218) | Parent is a raw pointer, there's no need to std::move it. |
360 ↗ | (On Diff #361218) | Nit: please use // comments in this struct, as per the rest of the code. As I understand it, this is a specific BigArchive class. Please rename it to be clear, e.g. BigArchiveFixLenHdr. |
368 ↗ | (On Diff #361218) | I think the presence of this member shows that the design is not right. BigArchive is a kind of Archive. The members specific to BigArchive types (as opposed to e.g. traditional BSD or GNU archives) should not be present in a type that should be generic. I think rather than, or possibly as well as, having this member, you really want to have concrete Archive subclasses that implement the relevant logic, and make Archive an abstract class. |
llvm/lib/Object/Archive.cpp | ||
46 ↗ | (On Diff #361218) | I don't like this define. It is not guaranteed that all future archive magic is 8 characters long. I don't think it would be necessary if we made the Archive class a class hierarchy as suggested above. I'm also slightly surprised that BigMagic doesn't start with !, but I assume that is correct and intentional. |
64 ↗ | (On Diff #361218) | Why has this code moved? |
127 ↗ | (On Diff #361218) | Terminator characters are cosmetic only, really, in regular archives too, but we still check them. |
159–160 ↗ | (On Diff #361218) | This comment has several typos in, and I'm not really sure what it is actually trying to say. |
305–307 ↗ | (On Diff #361218) | I'd prefer that this calculation be factored out into at least a variable, as I don't follow what it represents. |
311 ↗ | (On Diff #361218) | This pattern is a duplicate of most of the earlier expression. Put it in a variable rather than duplicating code. |
412 ↗ | (On Diff #361218) | getRawName appears not to trim the trailing spaces, but getRawUID does. This seems like a logical inconsistency that will likely lead to bugs. |
488 ↗ | (On Diff #361218) | These if/else checks clearly indicate that Child should really be BigArchiveChild and something else, sharing a common Child abstract base class. |
646–648 ↗ | (On Diff #361218) | Revert this change. |
682–683 ↗ | (On Diff #361218) | This looks like an unrelated formatting change. |
I can only suggest that you try implementing something yourself, perhaps by adopting and improving this patch, or starting an alternative version (inspired by this patch, and with credit given to @EGuesnet for their initial work, in the commit message).
Thanks @jhenderson ! That would work too . @DiggerLin Let us do it -- you can create another patch based on this and give credit to @EGuesnet if he can NOT address the comment in time.
Hi @jhenderson, @jsji , @DiggerLin,
As I said before, I do not have time to improve this PR before one or two months at least, as our priorities have shifted.
My proposal to @Esme about accepting the patch with minor changes was OK but now there are too much changes requested by reviewers.
So, I accept that you start an improvement / alternative version as long as you credit.
Please tag me when you will open your PR.
For your information, I have opened another PR about writing Big Archive here: https://reviews.llvm.org/D104367 .
Great! Thank you for your kind and contribution to this! @EGuesnet
@DiggerLin Please credit and tag @EGuesnet in your new patch. Thanks.
Instead of cd'ing to the source tree, these commands should be executed in the build tree -- i.e. drop this line, and change below to llvm-ar tv %p/Inputs/Big.a