Add support of AIX Big Archive. Read only.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/include/llvm/Object/Archive.h | ||
---|---|---|
371–373 | Do you really need this length? Why is it uint32_t? | |
376 | This is completely unthread safe - if you have a tool parsing two archives simultaneously, this will fail horribly. Do you really need it though? See my comments further down. From reading the spec, the archive uses offsets everywhere and it should be simple to use the values stored in the member headers. | |
llvm/lib/Object/Archive.cpp | ||
484 | Don't add this comment. We can see that these are child constructors by the function signature. | |
490 | Use std::make_unique to create a unique pointer without needing to explicitly mention new. Same everywhere else you've done this. | |
499 | Don't add a blank line at the function start. | |
648 | It seems to me like the logic you've added to this function is a little odd. I think you may be trying too hard to match the existing logic, when it doesn't really make sense. In traditional UNIX ar format, each child is immediately followed by the next one (barring a possible even-alignment padding byte), stopping when you get to the end. My reading of the Big Archive spec is that the next child's offset is defined by the ar_nxtmem member of its header. You should just be using that and fl_lstmoff to identify if the current child is the last one or not. You can then check to see if the child goes past the buffer end, as per the existing check in this file. |
You should get some people who are more familiar with XCOFF/AIX to help review correctness. There has been a lot of development for this in other tools (e.g. llvm-readobj).
The IBM team that has sent PR about llvm-readobj is aware of this PR. I will recontact them.
I think I have respond to most of your comment. The main issue is the static variable. I need to know the current location to distinguish Fix length header and Object headers, and I have not found cleaner solution.
Note that it has been checked about multiple read: Rust loads libLLVM.so once, and can read multiple archive. It works, as the static variable is initialized correctly before being used. But it is sequential read.
llvm/include/llvm/Object/Archive.h | ||
---|---|---|
371–373 | Maximal offset (in fixed length header) is stored as 20 bytes as decimal digit. So, maximal length is approximately 10^10. It is the total length of the archive. So, it must be uint64-t, not uint32_t. | |
376 | Main trouble is with first read, to distinguish Fixed length header and other one. | |
llvm/lib/Object/Archive.cpp | ||
648 |
Right.
But using offset needs to know the current location. You say in another comment this his highly not thread-safe and might be avoid. | |
1016 | I have removed code related to symbol read. | |
llvm/lib/Object/ArchiveWriter.cpp | ||
141 | It can be reached by external tools (not llvm-ar), so I modifie it to use report_fatal_error(). | |
llvm/test/Object/archive-big-read.test | ||
1 | I think it is OK now. | |
4 | Done. See archive-big-extract.test. |
I need to know the current location to distinguish Fix length header and Object headers, and I have not found cleaner solution.
I still don't understand why you need to know it to make this distinction. The two header types are completely different things as far as I understand the spec, used in different contexts. Could you read the Fixed Length header as the first part of reading the archive, and the object headers when reading individual children, then store them as members of the respective classes?
Note that it has been checked about multiple read: Rust loads libLLVM.so once, and can read multiple archive. It works, as the static variable is initialized correctly before being used. But it is sequential read.
Just because this works in one single case does not make this an appropriate design - in fact you haven't even tested the case where the issue is, as I bet that the thing you ran read the archives sequentially, not in parallel. What if I wrote a tool that wanted to read multiple Big-format archives in parallel, with one thread per archive? What would happen to the static variable on each individual thread?
Pseudo-code example reading algorithm that doesn't have the issues mentioned:
class Archive { FixedLengthHeader Header; std::vector<Child> Children; ArrayRef<uint8_t> Data; Archive(ArrayRef<uint8_t> Data) : Data(Data) { Header = reinterpret_cast<FixedLengthHeader> (Data.data()); uint64_t Offset = Header->FirstChildOffset; // Check if archive is empty. Child C(Offset, this); while (C != nullptr) { Children.push_back(C); C = C.nextChild(); } } }; class Child { ObjectHeader Header; uint8_t *Payload; Archive &Parent; Child(uint64_t Offset, Archive &Archive) { // Check Archive data is large enough for offset + header size + payload size. Parent = Archive; Header = reinterpret_cast<ObjectHeader>(Parent->Data.data() + Offset); Payload = Parent->Data.data() + Offset + sizeof(Header); } Child *nextChild() { if (Header->nextOffset == 0) return nullptr; return Child(Header->nextOffset, Parent); } };
llvm/lib/Object/Archive.cpp | ||
---|---|---|
648 |
I don't understand this comment at all. Do you mean forgotten, not forbidden? Perhaps you could change how things are read and stored? | |
llvm/test/Object/archive-big-read.test | ||
1 | Older tests use older styles. We have developed better styles in the meantime which are preferred for newer tests. |
Hi @jhenderson,
I cannot follow exactly what you propose, because llvm-ar (and maybe other tools) uses directly child_iterator. As this object does not have access to Archive (=Parent), it seems difficult to transfert Child vector of Archive to child_iterator. Moreover, I want to avoid to un-const Parent in Child.
Using the same idea, but not the same exact way, I propose:
- Move FixedLengthHeader on Archive, and use it for Big Archive only.
- Deal with first read (Fixed Length Header) in Archive creation, so any Child after do not care about it (one of the reason of static variable in previous version).
- To distinguish Object and Member Table, I cannot use Next Member in Object Header, because it points to Member Table in the last Object; I cannot use Offset to Member Table of Fixed Length Header, because I do not memorize the current position. So, I use a trick: an Object must have a name (adding empty-file to an archive is non valid), and Member Table Header does not have name, so name field is "`\n" (terminator).
- The child_begin_bigarchive() function is equivalent to child_begin(), but it deals correctly with Fixed Length Header.
getSize() and Big Archive Header initialiser are simplified, differences between usual Archive and Big Archive is deal during Child and Archive initialisation (so, correctly separated).
Free List is not yet supported: if you remove object in Archive, it will become unreadable by LLVM. Free List is complex, and not fully documented. I use report_fatal_error() if Free List is present in Archive (so, not 0), or if it is impossible to read it.
llvm/test/Object/archive-big-extract.test | ||
---|---|---|
1 ↗ | (On Diff #361218) | Can you please add the command of creating the ar file to the comment? Base on this patch, I am trying to enable tools (e.g. llvm-readobj) for ar format, but I am getting an error from BigArchiveMemberHeader::getSize() when applying this patch. # Create the ar file under AIX $ ar -v -q 1.a 1.o $ llvm-ar p 1.a evenlen llvm-ar: error: unable to load '2.a': truncated or malformed archive (characters in size field in archive header are not all decimal numbers: '\000\000\000\000\000\000\000\000\000\0005892' for archive member header at offset 128) |
llvm/lib/Object/Archive.cpp | ||
---|---|---|
1011 | It's not correct to calculate the offset of the first archive member by Data.getBufferStart() + strlen(Magic) + sizeof(Archive::ArFixLenHdrType);, please use the value of ArFixLenHdr->FirstArOffset. $ xlc 1.c -o 1.o $ ar -v -q 1.a 1.o $ llvm-ar tv 1.a llvm-ar: error: unable to load '1.a': truncated or malformed archive (characters in size field in archive header are not all decimal numbers: '\000\000\000\000\000\000\000\000\000\0005892' for archive member header at offset 128) The correct offset for this case should be 138 instead of 128. |
llvm/lib/Object/Archive.cpp | ||
---|---|---|
1011 | First, LLVM is no more a priority of our team, so I cannot spend time to this PR. // small.c int add_two (int a) { return a + 2; } $ xlc -v C for AIX Compiler, Version 5 OK, really old, but we don't care for archive. $ xlc small.c -o small.o -c $ ar -v -q small.a small.o $ xxd small.a 00000000: 3c62 6967 6166 3e0a 3130 3730 2020 2020 <bigaf>.1070 00000010: 2020 2020 2020 2020 2020 2020 3132 3332 1232 00000020: 2020 2020 2020 2020 2020 2020 2020 2020 00000030: 3020 2020 2020 2020 2020 2020 2020 2020 0 00000040: 2020 2020 3132 3820 2020 2020 2020 2020 128 00000050: 2020 2020 2020 2020 3132 3820 2020 2020 128 00000060: 2020 2020 2020 2020 2020 2020 3020 2020 0 00000070: 2020 2020 2020 2020 2020 2020 2020 2020 // End of Fixed-Length Header: size is 128 00000080: 3832 3020 2020 2020 2020 2020 2020 2020 820 00000090: 2020 2020 3130 3730 2020 2020 2020 2020 1070 000000a0: 2020 2020 2020 2020 3020 2020 2020 2020 0 000000b0: 2020 2020 2020 2020 2020 2020 3136 3331 1631 000000c0: 3630 3435 3039 2020 3020 2020 2020 2020 604509 0 000000d0: 2020 2020 3020 2020 2020 2020 2020 2020 0 000000e0: 3634 3420 2020 2020 2020 2020 3720 2020 644 7 [...] $ llvm-ar --version LLVM version 13.0.0 $ llvm-ar t small.a small.o # OK Third, according documentation https://www.ibm.com/docs/en/aix/7.2?topic=formats-ar-file-format-big , Fixed-Length Header must have a size of 128. 138 is not valide. |
llvm/lib/Object/Archive.cpp | ||
---|---|---|
1011 | Thanks for your reply!
You can also reproduce the issue if you add a dynamic lib to an archive. $ cat 1.c int main() { return 1; } $ xlc 1.c -qmkshrobj -o libt.so $ ar -v -q 1.a libt.so $ xxd 1.a 00000000: 3c62 6967 6166 3e0a 3134 3532 2020 2020 <bigaf>.1452 00000010: 2020 2020 2020 2020 2020 2020 3136 3134 1614 00000020: 2020 2020 2020 2020 2020 2020 2020 2020 00000030: 3020 2020 2020 2020 2020 2020 2020 2020 0 00000040: 2020 2020 3133 3420 2020 2020 2020 2020 134 00000050: 2020 2020 2020 2020 3133 3420 2020 2020 134 00000060: 2020 2020 2020 2020 2020 2020 3020 2020 0 00000070: 2020 2020 2020 2020 2020 2020 2020 2020 00000080: 0000 0000 0000 3131 3935 2020 2020 2020 ......1195 // There are some padding between the Fixed-Length Header and the First member. 00000090: 2020 2020 2020 2020 2020 3134 3532 2020 1452 000000a0: 2020 2020 2020 2020 2020 2020 2020 3020 0 000000b0: 2020 2020 2020 2020 2020 2020 2020 2020
|
llvm/lib/Object/Archive.cpp | ||
---|---|---|
1011 | Using ArFixLenHdr->FirstArOffset does not work. truncated or malformed archive (characters in name length field in archive header are not all decimal numbers: '' for archive member header at offset 68) I do not know where this "68" is from. In my opinion, this PR is complex enough. It might be accepted with as few changes as possible. So, without freelist, undocumented features... In a second time, you can create a new PR, to extend support of Big Archive, with new tests. |
llvm/lib/Object/Archive.cpp | ||
---|---|---|
1011 | OK, thanks! |
llvm/lib/Object/Archive.cpp | ||
---|---|---|
1011 | My mistake, you are right. I take FirstArOffset directly... |
llvm/lib/Object/Archive.cpp | ||
---|---|---|
104 | There are common source code in the BigArchiveMemberHeader::BigArchiveMemberHeader and ArchiveMemberHeader::ArchiveMemberHeader , we can put the common code in the constructor of AbstractArchiveMemberHeader |
llvm/include/llvm/Object/Archive.h | ||
---|---|---|
343 | I do not think we need a special child_begin_bigarchive() API for aix big ar here, |
Hi @DiggerLin,
I do not have time to improve this PR before begin of November. We have other priorities right now.
This PR has nearly 6 months, so, it can wait one more.
llvm/include/llvm/Object/Archive.h | ||
---|---|---|
103–153 | both the ArchiveMemberHeader and the BigArchiveMemberHeader has the "Archive const *Parent;" , can the member put in the AbstractArchiveMemberHeader ? | |
llvm/lib/Object/Archive.cpp | ||
62 | if the RawHeaderPtr == nullptr is true , the value of ArMemHdr will be random here. | |
108 | same problem as above. | |
175 | the variable name "end" can not express the mean, maybe be "NameSize" or "NameLen" ? | |
320–391 | there are several place use the almost the same code from line 308 to 320 . can we change these lines into a helper function? | |
564 | change "Name.size() + Name.size() % 2" to ((Name.size()+1) >>1 ) << 1 ? | |
572 | the Name.startswith("#1/")) is not for the Archive::K_AIXBIG. what happen if the AIXBIG archive file which has member name begin with "#/" here ? |
llvm/lib/Object/Archive.cpp | ||
---|---|---|
281 | it looks the parameter Size not be used here? |
llvm/include/llvm/Object/Archive.h | ||
---|---|---|
343 | also using a separate interface child_begin_bigarchive , you can not use the API iterator_range<child_iterator> children(Error &Err, bool SkipInternal = true) const { return make_range(child_begin(Err, SkipInternal), child_end()); |
the function file_magic llvm::identify_magic(StringRef Magic) should be modified to add big ar identification. otherwise object::createBinary will be error for big ar.
llvm/include/llvm/Object/Archive.h | ||
---|---|---|
343 | or you can call child_begin_bigarchive() in child_begin(Error &Err, bool SkipInternal = true) when Format is K_AIXBIG() |
llvm/lib/Object/Archive.cpp | ||
---|---|---|
320–391 | and I just wonder why you need the code std::string Buf; raw_string_ostream OS(Buf); OS.write_escaped( StringRef(ArMemHdr->Size, sizeof(ArMemHdr->Size)).rtrim(" ")); OS.flush(); to convert to std::string. for the StringRef , there is a function str() to convert to std::string, why not use str() of StringRef ? |
can you address the comment ,update the patch? we will approve it and We will create a following patch for the "Support of Big archive" @EGuesnet
In my opinion, this PR is complex enough. It might be accepted with as few changes as possible. So, without freelist, undocumented features... In a second time, you can create a new PR, to extend support of Big Archive, with new tests.
As @EGuesnet mentioned above, I think he won't have time to address the comments in time.
However, we do depend on this patch for AIX in near future,
so I would suggest that we *accept* this patch *as it is* for now, and commit it as @EGuesnet suggested.
And @DiggerLin will post follow up patches immediately.
Is that OK for reviewers? @jhenderson @rupprecht @Esme ?
This patch is not in a suitable state ready for commit. I've highlighted a high number of issues, and I think there are some fundamental issues with the approach, which need addressing before it is acceptable to land this in LLVM. I haven't finished reviewing the whole patch yet either. I'm sorry if you need the patch in the near future. I can only suggest that you try implementing something yourself, perhaps by adopting and improving this patch, or starting an alternative version (inspired by this patch, and with credit given to @EGuesnet for their initial work, in the commit message).
llvm/include/llvm/Object/Archive.h | ||
---|---|---|
40–42 | This comment is superfluous. clone always means this. | |
51–56 | getOffset isn't named getRawOffset, so doesn't look like it belong in this block. getRawName on the other hand looks like it does belong here. Please make sure all comments follow the existing style. At the moment, comments are doxygen style (see e.g. getName). LLVM style also uses trailing full stops on comments - please add them where they are missing. That being said, don't bother with comments that provide no useful benefit beyond the method names. For example, you do not need this comment or the "Non-raw getters" comment: what benefit do they provide? | |
108 | Please order const according to standard LLVM style, i.e. const T *. If you're changing a function anyway, it makes sense to fix the style. | |
110 | Please separate unrelated method declarations, or methods with non-single-line definitions with a blank line. | |
179 | Do we actually need all these new constructors and assignment operators? Are they used? | |
181 | Under what circumstances can C.Header be a nullptr? | |
184 | Nit: add new line before this method. | |
185 | Parent is a raw pointer, there's no need to std::move it. | |
372 | Nit: please use // comments in this struct, as per the rest of the code. As I understand it, this is a specific BigArchive class. Please rename it to be clear, e.g. BigArchiveFixLenHdr. | |
380 | I think the presence of this member shows that the design is not right. BigArchive is a kind of Archive. The members specific to BigArchive types (as opposed to e.g. traditional BSD or GNU archives) should not be present in a type that should be generic. I think rather than, or possibly as well as, having this member, you really want to have concrete Archive subclasses that implement the relevant logic, and make Archive an abstract class. | |
llvm/lib/Object/Archive.cpp | ||
46 | I don't like this define. It is not guaranteed that all future archive magic is 8 characters long. I don't think it would be necessary if we made the Archive class a class hierarchy as suggested above. I'm also slightly surprised that BigMagic doesn't start with !, but I assume that is correct and intentional. | |
66 | Why has this code moved? | |
129 | Terminator characters are cosmetic only, really, in regular archives too, but we still check them. | |
171–172 | This comment has several typos in, and I'm not really sure what it is actually trying to say. | |
317–319 | I'd prefer that this calculation be factored out into at least a variable, as I don't follow what it represents. | |
323 | This pattern is a duplicate of most of the earlier expression. Put it in a variable rather than duplicating code. | |
429 | getRawName appears not to trim the trailing spaces, but getRawUID does. This seems like a logical inconsistency that will likely lead to bugs. | |
506 | These if/else checks clearly indicate that Child should really be BigArchiveChild and something else, sharing a common Child abstract base class. | |
659–665 | Revert this change. | |
701–703 | This looks like an unrelated formatting change. |
I can only suggest that you try implementing something yourself, perhaps by adopting and improving this patch, or starting an alternative version (inspired by this patch, and with credit given to @EGuesnet for their initial work, in the commit message).
Thanks @jhenderson ! That would work too . @DiggerLin Let us do it -- you can create another patch based on this and give credit to @EGuesnet if he can NOT address the comment in time.
Hi @jhenderson, @jsji , @DiggerLin,
As I said before, I do not have time to improve this PR before one or two months at least, as our priorities have shifted.
My proposal to @Esme about accepting the patch with minor changes was OK but now there are too much changes requested by reviewers.
So, I accept that you start an improvement / alternative version as long as you credit.
Please tag me when you will open your PR.
For your information, I have opened another PR about writing Big Archive here: https://reviews.llvm.org/D104367 .
Great! Thank you for your kind and contribution to this! @EGuesnet
@DiggerLin Please credit and tag @EGuesnet in your new patch. Thanks.
The clone() method should return a std::unique_ptr<AbstractArchiveMemberHeader> to avoid raw memory management