This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Object/
-
Object/
73/84
ArchiveWriter.cpp
-
test/
-
Object/
5/6
archive-malformed-object.test
-
tools/llvm-ar/
-
llvm-ar/
13/15
big-archive-xcoff-align.test

Differential D144872

[AIX] Align the content of an xcoff object file which has auxiliary header in big archive.
ClosedPublic

Authored by DiggerLin on Feb 27 2023, 6:51 AM.

Download Raw Diff

Details

Reviewers

jhenderson
stephenpeckham
hubert.reinterpretcast
daltenty
MaskRay

Commits

rG4cc7c749c31e: [AIX] Align the content of an xcoff object file which has auxiliary header in…

Summary

if the member file is XCOFF object file and has auxiliary header, the content of the member file need to be aligned at the
MAX(maximum alignment of .text , maximum alignment of .data). The "maximum alignment of .text" and "maximum alignment of .data" are two
field of auxiliary header of XCOFF object file.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Herald added a project: Restricted Project. · View Herald TranscriptFeb 27 2023, 6:51 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

DiggerLin requested review of this revision.Feb 27 2023, 6:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 27 2023, 6:51 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

DiggerLin added a parent revision: D142660: [AIX] supporting -X options for llvm-ranlib in AIX OS.Feb 27 2023, 6:52 AM

Harbormaster completed remote builds in B216212: Diff 500774.Feb 27 2023, 8:16 AM

I've started reviewing, but have run out of time for this for now. One high-level question: regular Unix archive member alignment is done at the end of a member (I believe), rather than the start of the next, meaning that the final member will have tail padding, if needed. I take it that this isn't the case here, since the alignment is purely to ensure aligned data in the object?

llvm/lib/Object/ArchiveWriter.cpp
563–565	This should be referring to the Big Archive format, right, not the OS? Other suggestions in the inline edit.
566	Two nits, and one more significant point. No need for `const &` for `StringRef`, which is intended to be copied. `ObjStringRef` -> `Obj`. This function appears to be reading in the file and parsing it just to get the alignments. However, presumably this isn't the only place where we have the fully parsed object, since at some point you have to know what to write in the object file in the first place, right? Wouldn't it make more sense to identify and record this alignment then?
573	Seems like this should report the error, not just ignore it...
589	No need for `else` after `return`.

stephenpeckham added inline comments.Mar 2 2023, 7:13 AM

llvm/lib/Object/ArchiveWriter.cpp
563–565	The alignment is not a requirement of the Big Archive format. It's required by the OS for 64-bit members and recommended for 32-bit members. AIX allows shared objects to be archive members. When archive members are loaded, they are mapped into memory. If the members aren't aligned properly in the archive, they won't be aligned in memory. Both .text and .data are mapped, so the required member alignment takes into account both the .text and .data alignment. Alignment is not necessary for members that are not loadable.
582	Only loadable objects need to be aligned. Onc requirement for a loadable module is the presence of a loader section. The o_snloader field in the auxiliary header can be checked.
879	It's possible to have a loadable object with a very large text or data alignment. A sanity check would be useful here. The AIX ar command caps the alignment at 2^12 (the typical PAGESIZE on an AIX system).

DiggerLin updated this revision to Diff 503149.Mar 7 2023, 2:02 PM

DiggerLin marked 4 inline comments as done.

DiggerLin added inline comments.

llvm/lib/Object/ArchiveWriter.cpp
566	yes

Harbormaster completed remote builds in B217944: Diff 503149.Mar 7 2023, 2:02 PM

The alignment calculations look good now, including handling special cases.

jhenderson added inline comments.Mar 16 2023, 2:23 AM

llvm/lib/Object/ArchiveWriter.cpp
335	No need for explicit assignment of `nullptr` - `unique_ptr`'s default constructor leaves it in an empty state.
544–561	At the moment, I'm struggling to follow parts of this comment so I'd like to propose rewording it as follows: "AIX Big Archives may contain shared object members. The AIX OS requires these members to be aligned if they are 64-bit and recommends it for 32-bit members. This ensures that when these members are loaded they are aligned in memory." I think the rest of the comment can be moved into the method body, and I'll comment as appropriate.
555	Perhaps you could flip this on its head. Something like: XCOFFObjectFile *XCOFFObj = dyn_cast_or_null<XCOFFObjectFile>(SymObj); if (!XCOFFObj) // Replace this comment with a comment that says why "2" is the right value. return 2; ...
556	I think I'd make this a free-standing function. The body is long and it doesn't capture any variables, so I don't think making it a lambda is particularly helpful for readability. I'd also rename `Size` to be clearer what size it represents (e.g. "AuxHeaderSize" if that is correct).
558	Add a comment like: "If the member doesn't have an auxiliary header, it isn't a loadable object and so doesn't need aligning."
561–562	It's not clear to me why if it is missing one or other of these that 2 is the right choice. Why is it not the other alignment value?
563–564	Please use static_cast or reinterpret_cast, not C-style casts. That being said, I'm struggling to follow the logic here. Is this essentially testing if the Header is too small to contain the MaxAlignOfData field? If so, is that actually a permitted case? The `+ 2` in particular is throwing me off though.
567
571–573
575	This value of 12 is a magic number that is rather meaningless to a reader of the code. Please stick it in a named constant somewhere. Also, is `AlignSize` (and `MaxAlignSize`) really an appropriate name for the variable? An alignment value isn't a size, so unless you are aligning a size field or something, it doesn't really make sense as a name. This line also needs a comment explaining what it is doing and why.
579
748	Why has this comment changed only to ADD a typo??
750–751	Why has this variable been renamed?
792	I'm not sure I understand the changes to this loop. Why do you need to know anything about the next member to know how much padding the current member requires?

DiggerLin updated this revision to Diff 510127.Mar 31 2023, 1:56 PM

DiggerLin marked 14 inline comments as done.

Harbormaster completed remote builds in B223075: Diff 510127.Mar 31 2023, 1:57 PM

DiggerLin added inline comments.Mar 31 2023, 1:57 PM

llvm/lib/Object/ArchiveWriter.cpp
335	it need it , otherwise there is compiler error on line 327 "return {{}, std::move(Header), Names, Pad ? "\n" : ""};" as "missing field 'SymFile' initializer [-Werror,-Wmissing-field-initializers]"
750–751	I think "Pre" is more concise for the abbreviation of previous than "Prev" , I search from internet , the "Prev" is better , I change back.
792	there is field "char ar_nxtmem[20]" /* Next member offset-decimal */" in the "File Member Header" in big archive , we need to calculate the padding before the header if there is.
879	the pos points to the begin of the header of member file, it not be aligned, only the data of the member file is aligned.

I think I now understand why you've changed the NewMember loop stuff at least, but I wanted to raise one thing before I make any further comments: the spec https://www.ibm.com/docs/en/aix/7.2?topic=formats-ar-file-format-big makes no comment about all this complicated additional alignment stuff, and refers simply to aligning on even byte boundaries. From a traditional GNU-like archive format, this makes sense: the ar tool is designed for creating static archvies that are linked by the static linker. Loadability is not a thing that needs thinking about.

Are you suggesting that AIX Big Archives can be used to store shared objects in an archive, and then that the runtime loader can directly load shared objects from these archives?

llvm/lib/Object/ArchiveWriter.cpp
335	Ah, I didn't see you used it like that. In which case, this is fine.
506	This and the next constant are only applicable to Big Archives/AIX, but that isn't communicated by the variable name. Could this one be renamed `Log2OfAIXPageSize`?
508
512	Similar to above, perhaps `MinBigArchiveMemDataAlign`?
515	Function names start with lower-case letters...
518	Don't specifically say "even" here, because the variable name is talking about "minimum".
523	This comment doesn't explain the "why" of what you're doing. I think it's important that you explain this for this case. This was the point I was trying to get across with my previous comment, related to the casting style. Why is this the right thing to do here (in particular why the "2")?
534–535
567	Any reason you didn't adopt this comment suggestion I made? (You can change "so align at 2." in my suggestion to "so align at the minimum value."

In D144872#4239935, @jhenderson wrote:

the spec https://www.ibm.com/docs/en/aix/7.2?topic=formats-ar-file-format-big makes no comment about all this complicated additional alignment stuff, and refers simply to aligning on even byte boundaries. From a traditional GNU-like archive format, this makes sense: the ar tool is designed for creating static archvies that are linked by the static linker. Loadability is not a thing that needs thinking about.

Are you suggesting that AIX Big Archives can be used to store shared objects in an archive, and then that the runtime loader can directly load shared objects from these archives?

Yes, the AIX loader loads shared objects directly from archives. This simplifies the search for shared objects, because both 32- and 64-bit shared objects can be in the same archive. In addition, an archive can contain multiple shared objects. It is more efficient for the system loader to load archive members if the members are aligned in the archive. In fact, the system loader will refuse to load 64-bit members that are not aligned properly.

It's true that the documentation for the archive file format does not mention the alignment requirement. I don't think the writers of the documentation anticipated that a third party would be writing its own archive management tools.

DiggerLin updated this revision to Diff 510500.Apr 3 2023, 7:55 AM

DiggerLin marked 6 inline comments as done.

Harbormaster completed remote builds in B223354: Diff 510500.Apr 3 2023, 7:56 AM

thanks for comments.

jhenderson added inline comments.Apr 14 2023, 1:00 AM

llvm/lib/Object/ArchiveWriter.cpp
523
527	I still don't understand specifically why "2". Should this use some form of `offsetof` to get the actual position in the struct or similar?
718	We've had this sort of discussion before: you can't use static local variables like this that will change each time this function is called, because this function could be called from multiple threads.
791	Please re-review the LLVM coding standards. You keep making mistakes, despite being told before, that are covered in the coding standards.
848–861	Some more comments explaining why you're doing what you're doing in this block would be good.
879–880	Is this value correct for the last member in an archive?
887–888	If I'm not mistaken, the implication of this code is that you will no longer be able to store non-symbolic-files in BigArchives. Is that intended (and if so, is it correct)? For other formats, you can store non-symbolic files as archive members. They just don't contribute anything to the symbol table.
887–891	If I'm not mistaken, you could avoid a lot of this duplicate logic about the SymbolicFile by pulling it out of the isAIXBigArchive code. Something like this outline: for (auto M = NewMembers.begin(); M < NewMembers.end(); ++N) { if (NeedSymbols \|\| isAIXBigArchive(Kind)) { if (M == NewMembers.begin() { CurSymFile = // load symbolic file for first member } else { CurSymFile = std::move(NextSymFile); } if (M + 1 != NewMembers.end()) { NextSymFile = // load symbolic file for next member } // Do all of loop logic } } You'll need to handle non-symbolic files somehow, perhaps by just resetting the relevant std::unique_ptr to leave it empty, and then checking whether the pointer is empty before trying to read it.

address comment, thanks for James's comment .

Harbormaster completed remote builds in B226622: Diff 514982.Apr 19 2023, 9:10 AM

DiggerLin added inline comments.Apr 19 2023, 9:10 AM

llvm/lib/Object/ArchiveWriter.cpp
527	yes.
718	thanks for let me know.
791	sorry for careless, actually I know that ++M is required by LLVM coding standard.
879–880	yes, the value is correct. the NextOffset of last file member will be point to "Member Table" . and there is not a special requirement of content of "Member Table"
887–888	in the function static Expected<std::unique_ptr<SymbolicFile>> getSymbolFile(MemoryBufferRef Buf, LLVMContext &Context) { std::unique_ptr<object::SymbolicFile> Obj; const file_magic Type = identify_magic(Buf.getBuffer()); // Treat non symbolic file types as nullptr. if (!object::SymbolicFile::isSymbolicFile(Type, &Context)) return nullptr; it treats non symbolic file types as nullptr. it do not return a error.
887–891	in the if (M == NewMembers.begin()) , it not only open CurSymFile but also calculate MemHeadPadSize in the if ((M + 1) != NewMembers.end()), it not only open CurSymFile but also calculate NextMemHeadPadSize but also calculate the NextMemHeadPadSize even if I do as your suggestion, I still need to if (M == NewMembers.begin()) and if ((M + 1) != NewMembers.end()) for MemHeadPadSize and NextMemHeadPadSize later You'll need to handle non-symbolic files somehow, perhaps by just resetting the relevant std::unique_ptr to leave it empty, and then checking whether the pointer is empty before trying to read it. function getSymbolFile already return nullptr for non-symbolic files.

stephenpeckham accepted this revision.May 15 2023, 7:30 AM

This revision is now accepted and ready to land.May 15 2023, 7:30 AM

jhenderson added inline comments.Jun 12 2023, 12:18 AM

llvm/lib/Object/ArchiveWriter.cpp
822–823	(and make sure the comment respects the column limit properly - I haven't attempted to with my edit)
887–891	My complaint is that structurally, this and the above block are largely the same. Yes, one does a bit more than the other, but ultimately, the two are structurally identical. You could refactor the code to avoid this structural duplication, without too much difficulty, using my suggested code above. To do the extra work regarding calculate the padding sizes, simply add "`if (isAIXBigArchive(Kind))` clauses: for (auto M = NewMembers.begin(); M < NewMembers.end(); ++M) { if (NeedSymbols \|\| isAIXBigArchive(Kind)) { if (M == NewMembers.begin() { CurSymFile = // load symbolic file for first member if (isAIXBigArchive(Kind)) { // Do stuff with padding, as needed. } } else { CurSymFile = std::move(NextSymFile); } if (M + 1 != NewMembers.end()) { NextSymFile = // load symbolic file for next member if (isAIXBigArchive(Kind)) { // Do stuff with padding, as needed. } } // Do all of loop logic (or do some before the ifs and some after). } }

DiggerLin updated this revision to Diff 543996.Jul 25 2023, 8:42 AM

DiggerLin marked 3 inline comments as done.

DiggerLin added inline comments.

llvm/lib/Object/ArchiveWriter.cpp
887–891	if I change as your suggestion, it only reduce the code of if (!isAIXBigArchive(Kind)) { Expected<std::unique_ptr<SymbolicFile>> SymFileOrErr = getSymbolFile(Buf, Context); if (!SymFileOrErr) return createFileError(M->MemberName, SymFileOrErr.takeError()); CurSymFile = std::move(SymFileOrErr); } I still need to put variables `NextOffset` and `OffsetToMemData` under the `if (NeedSymbols \|\| isAIXBigArchive(Kind))` and it will several `if (isAIXBigArchive(Kind))` under `if (NeedSymbols \|\| isAIXBigArchive(Kind)) {` we will have duplication code of `printMemberHeader(Out, Pos, StringTable, MemberNames, Kind, Thin, M, ModTime, Size);` if (NeedSymbols \|\| isAIXBigArchive(Kind)) { ... if(!isAIXBigArchive(Kind)) printMemberHeader(Out, Pos, StringTable, MemberNames, Kind, Thin, M, ModTime, Size); } else { printMemberHeader(Out, Pos, StringTable, MemberNames, Kind, Thin, M, ModTime, Size); } I do not think the logic of your suggestion is clearer than the current code. If you strong suggest , I can change as your suggestion.

Harbormaster completed remote builds in B248010: Diff 543996.Jul 25 2023, 8:43 AM

rebase code

Harbormaster completed remote builds in B249015: Diff 545395.Jul 29 2023, 8:11 PM

jhenderson added inline comments.Jul 31 2023, 2:36 AM

llvm/lib/Object/ArchiveWriter.cpp

863–864

887–891

Either you've misunderstood my suggestion or I've missed something. I'm pretty sure I haven't missed anything, as I've just experimented with reorganising the code locally, and came up with a perfectly reasonable looking solution without even referring back to my old one, yet after looking back at it, it looks fairly similar.

The changes I made were as follows:

for (...) {
  ... after the existing file size limit check ...

  if (NeedSymbols || isAIXBigArchive(Kind)) {
    auto SetNextSymFile = [&NextSymFile,
                           &Context](MemoryBufferRef Buf,
                                     StringRef MemberName) -> Error {
      Expected<std::unique_ptr<SymbolicFile>> SymFileOrErr =
          getSymbolicFile(Buf, Context);
      if (!SymFileOrErr) {
        return createFileError(MemberName, SymFileOrErr.takeError());
      }
      NextSymFile = std::move(*SymFileOrErr);
      return Error::success();
    };

    if (M == NewMembers.begin())
      if (Error Err = SetNextSymFile(Buf, M->MemberName))
        return std::move(Err);
    CurSymFile = std::move(NextSymFile);

    if (M + 1 != NewMembers.end())
      if (Error Err = SetNextSymFile((M + 1)->Buf->getMemBufferRef(), (M + 1)->MemberName))
        return std::move(Err);
  }

  if (isAIXBigArchive(Kind)) {
    uint64_t OffsetToMemData = Pos + sizeof(object::BigArMemHdrType) +
                               alignTo(M->MemberName.size(), 2);

    if (M == NewMembers.begin()) {
      MemHeadPadSize = alignToPowerOf2(OffsetToMemData,
                                       getMemberAlignment(CurSymFile.get())) -
                       OffsetToMemData;
    } else {
      MemHeadPadSize = NextMemHeadPadSize;
    }

    ... update Pos, calculate NextOffset etc as before, then write header ...
  } else {
    printMemberHeader(...);
  }
  Out.flush();

  std::vector<unsigned> Symbols;
  if (NeedSymbols) {
    Expected<std::vector<unsigned>> SymbolsOrErr =
        getSymbols(CurSymFile.get(), Index, SymNames, SymMap);
    if (!SymbolsOrErr)
      return createFileError(M->MemberName, SymbolsOrErr.takeError());
    Symbols = std::move(*SymbolsOrErr);
    if (CurSymFile) // Can we remove this if check and set HasObject to true where CurSymFile is set?
      HasObject = true;
  }

  ... update Pos and Ret ...
}

I think this logic flows quite nicely, and certainly better than the current suggest. It avoids multiple different call sites to getSymbolicFile, for the cost of a few simple ifs. It also avoids any significant nested ifs (i.e. ones where the inner if is a large block), which helps readability of the code.

Aside: is the check on CurSymFile for the HasObject setting necessary? This potentially could be simplified, but I haven't got the time to follow this logic.

address comment

llvm/lib/Object/ArchiveWriter.cpp
887–891	thanks , I follow your suggestion.

DiggerLin updated this revision to Diff 545733.Jul 31 2023, 10:25 AM

Harbormaster completed remote builds in B249269: Diff 545733.Jul 31 2023, 12:50 PM

Just as a heads-up, I'm off for the rest of the week. Hopefully I'll be able to get to any final points either before the end of my workday or later next week.

llvm/lib/Object/ArchiveWriter.cpp
891–892	Did you see my aside about HasObject at the end of my long inline comment? Can the logic around `HasObject` be simplified in the latest version. I've not followed it enough to be confident either way.

DiggerLin marked an inline comment as done.Aug 1 2023, 6:44 AM

DiggerLin added inline comments.

llvm/lib/Object/ArchiveWriter.cpp
891–892	Can the logic around HasObject be simplified in the latest version. I do not think HasObject be simplified, in the loop for (auto M = NewMembers.begin(); M < NewMembers.end(); ++M) { ..... if (CurSymFile) HasObject = true; .... } ..... if (HasObject && SymNames.tell() == 0 && !isCOFFArchive(Kind)) SymNames << '\0' << '\0' << '\0'; return Ret; if any of member is symbolicFile and SymName is empty and !isCOFFArchive(Kind)) , it will SymNames << '\0' << '\0' << '\0'; I do not think we can simplify it.

DiggerLin marked an inline comment as done.Aug 1 2023, 6:45 AM

In D144872#4549482, @jhenderson wrote:

Just as a heads-up, I'm off for the rest of the week. Hopefully I'll be able to get to any final points either before the end of my workday or later next week.

Enjoy your vacation.

I'll be honest, I'm not convinced the test coverage looks to be anywhere near comprehensive enough. However, I also haven't attempted to review it versus the code changes to check how much of the new code is actually covered. I'd suggest you consider using a code coverage tool to see how much coverage there is of your new code.

llvm/lib/Object/ArchiveWriter.cpp
508–511	Could you reflow this comment, please? It looks like some of the word wrapping is happening earlier than needed. A quick look at many of the other new comments in this file makes it look like they have the same issue. Please review all your comments and reflow as necessary.
544–560	I can't remember whether we decided it should be "Big Archive" or "big archive". Either way, you should be consistent in all your comments and use one or the other in every case, rather than a mixture (I think "big archive" is the norm elsewhere in this patch).
681	Could you please move this back to where it was (immediately before the `if (SymMap)`. There's no reason to move it, and by moving it you've left it further from its point of use than it needs to be.
828–830	No need for the braces here.
1171	It may be slightly more efficient to do something like: OS << std::string(M.PreHeadPadSize, '\0'); It's certainly a little more elegant. Alternatively, you could use `std::fill_n`. There are good explanations of both these at https://stackoverflow.com/a/11421689.

In D144872#4579160, @jhenderson wrote:

I'll be honest, I'm not convinced the test coverage looks to be anywhere near comprehensive enough. However, I also haven't attempted to review it versus the code changes to check how much of the new code is actually covered. I'd suggest you consider using a code coverage tool to see how much coverage there is of your new code.

In the test big-archive-xcoff-auxi-head-align.test , it test whether the XCOFF object data member which has auxiliary header  align correctly  based on the information of  auxiliary header. it is enough here.  In AIX OS since build bot use  CMake 3.22, it use llvm-ar instead of  AIX OS ar.  it will test the functionality too. (we have to add the -DCMAKE_AR=/usr/bin/ar when we compile, since llvm-ar do not align XCOFF object correctly for big archive in AIX os ). after the the patch commit, we will use the `llvm-ar` instead of  the AIX `ar`

DiggerLin updated this revision to Diff 549994.Aug 14 2023, 9:36 AM

DiggerLin marked 3 inline comments as done.

Harbormaster completed remote builds in B252405: Diff 549994.Aug 14 2023, 9:37 AM

DiggerLin added inline comments.Aug 14 2023, 9:40 AM

llvm/lib/Object/ArchiveWriter.cpp
1171	we write M.PreHeadPadSize of '\0' , using std::string(M.PreHeadPadSize, '\0') , the string is a empty string , the `OS << std::string(M.PreHeadPadSize, '\0')` do not output M.PreHeadPadSize of '\0'

rebase the patch

Harbormaster completed remote builds in B252456: Diff 550070.Aug 14 2023, 6:45 PM

jhenderson added inline comments.Aug 15 2023, 1:57 AM

llvm/lib/Object/ArchiveWriter.cpp
1171	Did you actually try my suggestion? Your statement is wrong. I've tried it with some simple code and it works fine - `OS << std::string(10, '\0')` writes 10 null bytes to the output. `std::string(M.PreHeadPadSize, '\0')` constructs a `std::string` that contains `M.PreHeadPadSize` null bytes. It is NOT an empty string (although using `.c_str()` will make it look like it is).

In D144872#4584865, @DiggerLin wrote:

In D144872#4579160, @jhenderson wrote:

I'll be honest, I'm not convinced the test coverage looks to be anywhere near comprehensive enough. However, I also haven't attempted to review it versus the code changes to check how much of the new code is actually covered. I'd suggest you consider using a code coverage tool to see how much coverage there is of your new code.

In the test big-archive-xcoff-auxi-head-align.test , it test whether the XCOFF object data member which has auxiliary header align correctly based on the information of auxiliary header. it is enough here. In AIX OS since build bot use CMake 3.22, it use llvm-ar instead of AIX OS ar. it will test the functionality too. (we have to add the -DCMAKE_AR=/usr/bin/ar when we compile, since llvm-ar do not align XCOFF object correctly for big archive in AIX os ). after the the patch commit, we will use the llvm-ar instead of the AIX ar

For your code to land in LLVM, it needs to have appropriate levels of testing in LLVM (i.e. lit tests and gtest unit tests). Otherwise, an LLVM developer could easily make a change that breaks the existing behaviour in a subtle way. Relying on it being a system archiver on some cases will not provide the level of coverage you need IN LLVM. Bugs will creep in and will impact your system users, which is of no benefit to anyone.

added new test scenarios .

Harbormaster completed remote builds in B252652: Diff 550342.Aug 15 2023, 9:22 AM

DiggerLin marked 2 inline comments as done.Aug 16 2023, 6:42 AM

DiggerLin added inline comments.

llvm/lib/Object/ArchiveWriter.cpp
1171	it work, thanks

I've added a few more nits. I've also gone through and highlighted a number of cases that I'm pretty confident don't have existing test coverage. I shouldn't have had to go through these bits myself and highlight all of this - you should have done this yourself, prior to the patch even going up for review. Requesting changes to clear the "Accepted" state.

llvm/lib/Object/ArchiveWriter.cpp
526–527	This doesn't seem to be covered?
531	`SecNumOfLoader` is unsigned, right? So it can never be less than 0 by definition...
531–532	This probably isn't covered?
540	This needs test cases for the text alignment being bigger and the data alignment being bigger. I can't tell from the existing tests whether both cases are covered.
541	Are both parts of this ternary covered?
550	Nit: delete this blank line - the if below is strongly linked to the previous line.
555
836	Test case needed.
843	Test case needed.
llvm/test/tools/llvm-ar/big-archive-xcoff-align.test
6
7	Delete this blank line - the comment above is associated with the following test case, but the blank line suggests it either doesn't, or that it applies to the whole test file.
9	Don't rely on inputs in another part of the test tree. It is quite possible that these files will move/be deleted etc, and people won't expect that to impact tests in another part of the testing tree. Instead, you should create tehse files using yaml2obj. The other aspect of this is that I have no way of telling that the inputs you're using actually have the properties that you are trying to test for (without inspecting the binaries). Having the yaml2obj form of them would enable this.
13	Nit: prefer `--check-prefix` over `-check-prefix` in new tests. Applies throughout.
23	This blank line to me means the comment above is not associated with the checks that follow below. Delete it.

This revision now requires changes to proceed.Aug 23 2023, 12:23 AM

add new test scenario

llvm/lib/Object/ArchiveWriter.cpp
526–527	Are we need a test case to cover all the statement ?
836	we just refactor the code here, not adding a new functionality here, I do not think we need a new test case here.
843	some reason as above.

DiggerLin updated this revision to Diff 553655.Aug 25 2023, 3:45 PM

DiggerLin added inline comments.

llvm/lib/Object/ArchiveWriter.cpp
526–527	I added a test scenario for it anyway.

Harbormaster completed remote builds in B255019: Diff 553655.Aug 25 2023, 7:55 PM

jhenderson added inline comments.Aug 29 2023, 2:34 AM

llvm/lib/Object/ArchiveWriter.cpp
526–527	Yes, all statements should be covered by tests. Otherwise, you've got nothing that will show that this code is functioning correctly.
836	Although I agree that the code to load a symbolic file is just a refactoring of existing code, this use of that code is a new call site, where an error needs checking in a test. If there's an existing test that hits this specific piece of code, then that's fine, but if there isn't, it needs a new test. Otherwise, there would be no test that shows that the error is properly handled in this case. Same goes below.
llvm/test/tools/llvm-ar/big-archive-xcoff-align.test
24	Rather than repeating this python snippet over and over again, could you write a little python script at the end of this file, use split-file to split it into a .py file at runtime, and then execute the file with all the different input arguments? The RUN line would end up something like: # RUN: %python print-magic.py 262 \| FileCheck --check-prefix=MAGIC32
37–38	Or: "Test that the content of XCOFF object files, which don't have auxiliary headers, are aligned at 2 in a big archive."
55	This is a near-duplicate of the previous YAML doc. Could you just parameterise the section name, like you do with the `FLAG` input parameter?
62
74
102	I'm not sure what "but excess the page size" means. Should it be "but they exceed the page size" or something?

address comment

DiggerLin added inline comments.Aug 29 2023, 10:53 AM

llvm/lib/Object/ArchiveWriter.cpp
836	there is existing test for the code when you implement the https://reviews.llvm.org/D88288 https://github.com/llvm/llvm-project/blob/main/llvm/test/Object/archive-malformed-object.test#L13 https://github.com/llvm/llvm-project/blob/main/llvm/test/Object/archive-malformed-object.test#L19

DiggerLin marked an inline comment as done.Aug 29 2023, 10:53 AM

Harbormaster completed remote builds in B255580: Diff 554424.Aug 29 2023, 2:59 PM

jhenderson added inline comments.Aug 30 2023, 12:47 AM

llvm/lib/Object/ArchiveWriter.cpp
836	Okay, thanks. I agree that this check on this line is covered by the existing test. However, `archive-malformed-object.test` doesn't cover the case where a second or later member is bad, I believe, which means that the check and report of the error at lines 841-843 in this current version of the patch aren't covered still. Also, that test only covers it for the case where symbol tables are needed (i.e. when llvm-ar without the "S" option is used). Ideally, you would also have a test case that shows that when no symbols are requested (i.e. `llvm-ar S<more options> test.a xcoff.o`), an error is also reported for an invalid member of a big archive. You could cover both missing cases sufficiently by adding the following two test cases somewhere: llvm-ar rcS archive.a bad.o llvm-ar rcS archive.a good.o bad.o where `archive.a` is a big archive, `bad.o` is a file that will cause `getSymbolicFile` to fail, and `good.o` is a file that it will work with. (Obviously you'd rename these files in the real test)
llvm/test/tools/llvm-ar/big-archive-xcoff-align.test
57	It looks like you missed that it should be "has neither" not "have neither"
98

added more test scenario and address comment.

DiggerLin added inline comments.Aug 31 2023, 8:56 AM

llvm/lib/Object/ArchiveWriter.cpp
836	Okay, thanks. I agree that this check on this line is covered by the existing test. However, archive-malformed-object.test doesn't cover the case where a second or later member is bad, I believe, which means that the check and report of the error at lines 841-843 in this current version of the patch aren't covered still. I will add a new test scenario to cover it. Ideally, you would also have a test case that shows that when no symbols are requested (i.e. llvm-ar S<more options> test.a xcoff.o), an error is also reported for an invalid member of a big archive. what is the purpose the test ? do you want to test whether the code go into the following block when `isAIXBigArchive(Kind)` of `if (NeedSymbols != SymtabWritingMode::NoSymtab \|\| isAIXBigArchive(Kind))` is true, Or test the functionality of the following block(we already has test scenario for it) {auto SetNextSymFile = [&NextSymFile, ..... if ((M + 1) != NewMembers.end()) if (Error Err = SetNextSymFile((M + 1)->Buf->getMemBufferRef(), (M + 1)->MemberName)) return std::move(Err); } if we want to check whether going into above block, I think the align functionality will not work without the code go into the above block. we already has test scenario to test the align of big archive. I add the test scenario anyway.

Harbormaster completed remote builds in B256043: Diff 555069.Aug 31 2023, 9:21 AM

jhenderson added inline comments.Sep 1 2023, 12:32 AM

llvm/lib/Object/ArchiveWriter.cpp
836	what is the purpose the test ? Unlike the last few tests I've requested, which test very specific code paths, this is more of a high-level-thinking test (often called a "black box test"), i.e. one where we don't know what the details of the code are. Specifically, the aim of the test is to show that "For Big Archive, if an input object can't be loaded, we correctly report an error." Such high-level tests are useful in addition to the low-level coverage tests, because they are less likely to be invalidated by subsequent changes to the code (for example, if somebody changed this code path to be big archive only, because they decide to handle symbol tables differently, some of the test coverage provided by the existing test would no longer cover this area of code). High-level tests can often provide low-level coverage at the same time, and similarly low-level coverage tests can cover high-level topics at the same time (e.g. the other test you've recently modified covers "For archives that require a symbol table, if an input object can't be loaded, we correctly report an error"). A consequence of this dual-purpose is that sometimes there is overlap in terms of raw coverage that tests provide, as you've seen here.
llvm/test/Object/archive-malformed-object.test
15	I don't think this test should use an XCOFF object. XCOFF is not a well-known format, so using it purely to provide a "good" first object will make it look like it has been chosen very deliberately, which confuses the purpose of the test. Better would be to have another bitcode object that isn't malformed as the first object. You can then have `bad.bc` and `good.bc`, which more clearly indicates what is important.
17	I'm pretty sure you don't want the `S` here? Compare this command to the one above. If you are using a non-big archive format (which will be the default on many people's systems), then the symbols are only loaded if `S` is not present. If the symbols aren't loaded, the bitcode file won't result in an error (I think).

DiggerLin marked 2 inline comments as done.Sep 1 2023, 10:00 AM

DiggerLin added inline comments.

llvm/test/Object/archive-malformed-object.test
17	I need to `S` to test your comment , the archive format always depend on the first object file of llvm-ar argument(no matter the OS). when the xcoff object file, the archive is big archive. `Ideally, you would also have a test case that shows that when no symbols are requested (i.e. llvm-ar S<more options> test.a xcoff.o), an error is also reported for an invalid member of a big archive.`

address comment

delete an empty line

jhenderson added inline comments.Sep 4 2023, 12:43 AM

llvm/test/Object/archive-malformed-object.test
17	Oh, okay I misremembered how the format selection works. Using `--format=bigarchive` is probably a good idea anyway for clarity. My expectation with the suggestion you've quoted was that `xcoff.o` would be the file that couldn't be read, rather than your test case mixing bitcode and xcoff files. I think mixing formats might be a little bit confusing, so I'd either have one/two xcoff files (so the same pair of cases as the invalid bitcode with symbol table cases above, but using xcoff files), or even just the same as the previous two cases, but with `S` (and the `--format=bigarchive` option), using the same bitcode files. I have no particular preference (but it probably is a good idea to include both "first" and "not first" cases for completeness).

address comment

Harbormaster completed remote builds in B256638: Diff 555917.Sep 5 2023, 1:32 PM

LGTM, with 2 comment nits and one small test issue that should be addressed before merging.

llvm/test/Object/archive-malformed-object.test
32–34	Do we need to be explicit about gnu format? If the archive format is derived from the first member, this won't be big archvie, right? If we don't need to be explicit, please remove the `--format` option, so that it can capture more cases. If for whatever reason it would be bigarchive without the format option, it's fine to leave as-is. In either case, I suggest changing the comment to say "not required for formats other than the big archive format." as it's not specifically gnu format here that's important.
llvm/test/tools/llvm-ar/big-archive-xcoff-align.test
39	Nit, missed earlier.
69

This revision is now accepted and ready to land.Sep 6 2023, 12:31 AM

DiggerLin marked an inline comment as done.Sep 6 2023, 8:08 AM

DiggerLin added inline comments.

llvm/test/Object/archive-malformed-object.test
32–34	input.o is malformed object file, the archive format will depend on the getDefaultKindForHost(). So the --format=gnu is need here.

Closed by commit rG4cc7c749c31e: [AIX] Align the content of an xcoff object file which has auxiliary header in… (authored by zhijian <zhijian@ca.ibm.com>). · Explain WhySep 6 2023, 8:36 AM

This revision was automatically updated to reflect the committed changes.

zhijian <zhijian@ca.ibm.com> added a commit: rG4cc7c749c31e: [AIX] Align the content of an xcoff object file which has auxiliary header in….

zhijian <zhijian@ca.ibm.com> mentioned this in rGf903119cce63: Fixed a compile error on use of deleted function ¡®{anonymous}::MemberData….Sep 6 2023, 9:05 AM

Revision Contents

Path

Size

llvm/

lib/

Object/

ArchiveWriter.cpp

236 lines

test/

Object/

archive-malformed-object.test

17 lines

tools/

llvm-ar/

big-archive-xcoff-align.test

109 lines

Diff 556046

llvm/lib/Object/ArchiveWriter.cpp

Show First 20 Lines • Show All 325 Lines • ▼ Show 20 Lines

} }

namespace { namespace {

struct MemberData { struct MemberData {

std::vector<unsigned> Symbols; std::vector<unsigned> Symbols;

std::string Header; std::string Header;

StringRef Data; StringRef Data;

StringRef Padding; StringRef Padding;

uint64_t PreHeadPadSize = 0;

std::unique_ptr<SymbolicFile> SymFile = nullptr;

jhendersonUnsubmitted

Done

uint64_t PreHeadPadSize = 0;

- std::unique_ptr<SymbolicFile> SymFile = nullptr;

+ std::unique_ptr<SymbolicFile> SymFile;

};

} // namespace

No need for explicit assignment of nullptr - unique_ptr's default constructor leaves it in an empty state.

jhenderson: No need for explicit assignment of `nullptr` - `unique_ptr`'s default constructor leaves it in…

DiggerLinAuthorUnsubmitted

Done

it need it , otherwise there is compiler error on
line 327
"return {{}, std::move(Header), Names, Pad ? "\n" : ""};"

as "missing field 'SymFile' initializer [-Werror,-Wmissing-field-initializers]"

DiggerLin: it need it , otherwise there is compiler error on line 327 "return {{}, std::move(Header)…

jhendersonUnsubmitted

Not Done

Ah, I didn't see you used it like that. In which case, this is fine.

jhenderson: Ah, I didn't see you used it like that. In which case, this is fine.

}; };

} // namespace } // namespace

static MemberData computeStringTable(StringRef Names) { static MemberData computeStringTable(StringRef Names) {

unsigned Size = Names.size(); unsigned Size = Names.size();

unsigned Pad = offsetToAlignment(Size, Align(2)); unsigned Pad = offsetToAlignment(Size, Align(2));

std::string Header; std::string Header;

raw_string_ostream Out(Header); raw_string_ostream Out(Header);

▲ Show 20 Lines • Show All 149 Lines • ▼ Show 20 Lines getSymbolicFile(MemoryBufferRef Buf, LLVMContext &Context) {

} else { } else {

auto ObjOrErr = object::SymbolicFile::createSymbolicFile(Buf); auto ObjOrErr = object::SymbolicFile::createSymbolicFile(Buf);

if (!ObjOrErr) if (!ObjOrErr)

return ObjOrErr.takeError(); return ObjOrErr.takeError();

return std::move(*ObjOrErr); return std::move(*ObjOrErr);

} }

static Expected<bool> is64BitSymbolicFile(const StringRef &ObjStringRef) { static bool is64BitSymbolicFile(const SymbolicFile *SymObj) {

MemoryBufferRef ObjMbf(ObjStringRef, ""); return SymObj != nullptr ? SymObj->is64Bit() : false;

// In the scenario when LLVMContext is populated SymbolicFile will contain a }

// reference to it, thus SymbolicFile should be destroyed first.

LLVMContext Context;

Expected<std::unique_ptr<SymbolicFile>> ObjOrErr =

getSymbolicFile(ObjMbf, Context);

if (!ObjOrErr)

return ObjOrErr.takeError();

// Treat non-symbolic file types as not 64-bits. // Log2 of PAGESIZE(4096) on an AIX system.

if (!*ObjOrErr) static const uint32_t Log2OfAIXPageSize = 12;

jhendersonUnsubmitted

Done

This and the next constant are only applicable to Big Archives/AIX, but that isn't communicated by the variable name. Could this one be renamed Log2OfAIXPageSize?

jhenderson: This and the next constant are only applicable to Big Archives/AIX, but that isn't communicated…

return false;

return (*ObjOrErr)->is64Bit(); // In the AIX big archive format, since the data content follows the member file

jhendersonUnsubmitted

Done

static const uint32_t Log2OfPageSize = 12;

- // In AIX big archive, Since the data content follows the member

+ // In the AIX big archive format, since the data content follows the member

// file name, if the name ends on an odd byte, an extra byte will be

jhenderson:

// name, if the name ends on an odd byte, an extra byte will be added for

// padding. This ensures that the data within the member file starts at an even

// byte.

jhendersonUnsubmitted

Done

Could you reflow this comment, please? It looks like some of the word wrapping is happening earlier than needed.

A quick look at many of the other new comments in this file makes it look like they have the same issue. Please review all your comments and reflow as necessary.

jhenderson: Could you reflow this comment, please? It looks like some of the word wrapping is happening…

static const uint32_t MinBigArchiveMemDataAlign = 2;

jhendersonUnsubmitted

Done

Similar to above, perhaps MinBigArchiveMemDataAlign?

jhenderson: Similar to above, perhaps `MinBigArchiveMemDataAlign`?

template <typename AuxiliaryHeader>

uint16_t getAuxMaxAlignment(uint16_t AuxHeaderSize, AuxiliaryHeader *AuxHeader,

jhendersonUnsubmitted

Done

template <typename AuxiliaryHeader>

- uint16_t GetAuxMaxAlignment(uint16_t AuxHeaderSize, AuxiliaryHeader *AuxHeader,

+ uint16_t getAuxMaxAlignment(uint16_t AuxHeaderSize, AuxiliaryHeader *AuxHeader,

uint16_t Log2OfMaxAlign) {

Function names start with lower-case letters...

jhenderson: Function names start with lower-case letters...

uint16_t Log2OfMaxAlign) {

// If the member doesn't have an auxiliary header, it isn't a loadable object

// and so it just needs aligning at the minimum value.

jhendersonUnsubmitted

Done

// If the member doesn't have an auxiliary header, it isn't a

- // loadable object and so it just need align at even.

+ // loadable object and so it just needs aligning at the minimum value.

if (AuxHeader == nullptr)

Don't specifically say "even" here, because the variable name is talking about "minimum".

jhenderson: Don't specifically say "even" here, because the variable name is talking about "minimum".

if (AuxHeader == nullptr)

return MinBigArchiveMemDataAlign;

// If the auxiliary header does not have both MaxAlignOfData and

// MaxAlignOfText field, it is not a loadable shared object file, so align at

jhendersonUnsubmitted

Not Done

This comment doesn't explain the "why" of what you're doing. I think it's important that you explain this for this case.

This was the point I was trying to get across with my previous comment, related to the casting style. Why is this the right thing to do here (in particular why the "2")?

jhenderson: This comment doesn't explain the "why" of what you're doing. I think it's important that you…

jhendersonUnsubmitted

Done

// If the auxiliary header does not have both MaxAlignOfData and

- // MaxAlignOfText field, it is not loadable share object file,

+ // MaxAlignOfText field, it is not a loadable share object file, so

// align at the minimum value.

jhenderson:

// the minimum value. The 'ModuleType' member is located right after

// 'MaxAlignOfData' in the AuxiliaryHeader.

if (AuxHeaderSize < offsetof(AuxiliaryHeader, ModuleType))

return MinBigArchiveMemDataAlign;

jhendersonUnsubmitted

Done

I still don't understand specifically why "2". Should this use some form of offsetof to get the actual position in the struct or similar?

jhenderson: I still don't understand specifically why "2". Should this use some form of `offsetof` to get…

DiggerLinAuthorUnsubmitted

Done

yes.

DiggerLin: yes.

jhendersonUnsubmitted

Done

This doesn't seem to be covered?

jhenderson: This doesn't seem to be covered?

DiggerLinAuthorUnsubmitted

Done

Are we need a test case to cover all the statement ?

DiggerLin: Are we need a test case to cover all the statement ?

DiggerLinAuthorUnsubmitted

Done

I added a test scenario for it anyway.

DiggerLin: I added a test scenario for it anyway.

jhendersonUnsubmitted

Not Done

Yes, all statements should be covered by tests. Otherwise, you've got nothing that will show that this code is functioning correctly.

jhenderson: Yes, all statements should be covered by tests. Otherwise, you've got nothing that will show…

// If the XCOFF object file does not have a loader section, it is not

// loadable, so align at the minimum value.

if (AuxHeader->SecNumOfLoader == 0)

jhendersonUnsubmitted

Not Done

// loadable, so align at the minimum value.

- if (AuxHeader->SecNumOfLoader <= 0)

+ if (AuxHeader->SecNumOfLoader == 0)

return MinBigArchiveMemDataAlign;

SecNumOfLoader is unsigned, right? So it can never be less than 0 by definition...

jhenderson: `SecNumOfLoader` is unsigned, right? So it can never be less than 0 by definition...

return MinBigArchiveMemDataAlign;

jhendersonUnsubmitted

Not Done

This probably isn't covered?

jhenderson: This probably isn't covered?

// The content of the loadable member file needs to be aligned at MAX(maximum

// alignment of .text, maximum alignment of .data) if there are both fields.

jhendersonUnsubmitted

Done

// The content of the loadable member file needs to be aligned at

- // MAX(maximum alignment of .text, maximum alignment of .data) if there is

- // two fields. If desired alignment is > PAGESIZE, 32-bit members are aligned

+ // MAX(maximum alignment of .text, maximum alignment of .data) if there are

+ // both fields. If the desired alignment is > PAGESIZE, 32-bit members are aligned

// on a word boundary, while 64-bit members are aligned on a

jhenderson:

// If the desired alignment is > PAGESIZE, 32-bit members are aligned on a

// word boundary, while 64-bit members are aligned on a PAGESIZE(2^12=4096)

// boundary.

uint16_t Log2OfAlign =

std::max(AuxHeader->MaxAlignOfText, AuxHeader->MaxAlignOfData);

jhendersonUnsubmitted

Not Done

This needs test cases for the text alignment being bigger and the data alignment being bigger. I can't tell from the existing tests whether both cases are covered.

jhenderson: This needs test cases for the text alignment being bigger and the data alignment being bigger.

return 1 << (Log2OfAlign > Log2OfAIXPageSize ? Log2OfMaxAlign : Log2OfAlign);

jhendersonUnsubmitted

Not Done

Are both parts of this ternary covered?

jhenderson: Are both parts of this ternary covered?

}

// AIX big archives may contain shared object members. The AIX OS requires these

// members to be aligned if they are 64-bit and recommends it for 32-bit

// members. This ensures that when these members are loaded they are aligned in

// memory.

static uint32_t getMemberAlignment(SymbolicFile *SymObj) {

XCOFFObjectFile *XCOFFObj = dyn_cast_or_null<XCOFFObjectFile>(SymObj);

if (!XCOFFObj)

jhendersonUnsubmitted

Done

Nit: delete this blank line - the if below is strongly linked to the previous line.

jhenderson: Nit: delete this blank line - the if below is strongly linked to the previous line.

return MinBigArchiveMemDataAlign;

// If the desired alignment is > PAGESIZE, 32-bit members are aligned on a

// word boundary, while 64-bit members are aligned on a PAGESIZE boundary.

return XCOFFObj->is64Bit()

jhendersonUnsubmitted

Done

Perhaps you could flip this on its head. Something like:

XCOFFObjectFile *XCOFFObj = dyn_cast_or_null<XCOFFObjectFile>(SymObj);
if (!XCOFFObj)
  // Replace this comment with a comment that says why "2" is the right value.
  return 2;
...

jhenderson: Perhaps you could flip this on its head. Something like: ``` XCOFFObjectFile *XCOFFObj =…

jhendersonUnsubmitted

Done

// If the desired alignment is > PAGESIZE, 32-bit members are aligned on a

- // word boundary, while 64-bit members are aligned on a PAGESIZE boundary

+ // word boundary, while 64-bit members are aligned on a PAGESIZE boundary.

return XCOFFObj->is64Bit()

jhenderson:

? getAuxMaxAlignment(XCOFFObj->fileHeader64()->AuxHeaderSize,

jhendersonUnsubmitted

Done

I think I'd make this a free-standing function. The body is long and it doesn't capture any variables, so I don't think making it a lambda is particularly helpful for readability.

I'd also rename Size to be clearer what size it represents (e.g. "AuxHeaderSize" if that is correct).

jhenderson: I think I'd make this a free-standing function. The body is long and it doesn't capture any…

XCOFFObj->auxiliaryHeader64(),

Log2OfAIXPageSize)

jhendersonUnsubmitted

Done

Add a comment like: "If the member doesn't have an auxiliary header, it isn't a loadable object and so doesn't need aligning."

jhenderson: Add a comment like: "If the member doesn't have an auxiliary header, it isn't a loadable object…

: getAuxMaxAlignment(XCOFFObj->fileHeader32()->AuxHeaderSize,

XCOFFObj->auxiliaryHeader32(), 2);

jhendersonUnsubmitted

Not Done

I can't remember whether we decided it should be "Big Archive" or "big archive". Either way, you should be consistent in all your comments and use one or the other in every case, rather than a mixture (I think "big archive" is the norm elsewhere in this patch).

jhenderson: I can't remember whether we decided it should be "Big Archive" or "big archive". Either way…

} }

jhendersonUnsubmitted

Done

At the moment, I'm struggling to follow parts of this comment so I'd like to propose rewording it as follows:

"AIX Big Archives may contain shared object members. The AIX OS requires these members to be aligned if they are 64-bit and recommends it for 32-bit members. This ensures that when these members are loaded they are aligned in memory."

I think the rest of the comment can be moved into the method body, and I'll comment as appropriate.

jhenderson: At the moment, I'm struggling to follow parts of this comment so I'd like to propose rewording…

jhendersonUnsubmitted

Not Done

return 2;

- // If the auxiliary header have not both MaxAlignOfData and MaxAlignOfText

+ // If the auxiliary header does not have both MaxAlignOfData and MaxAlignOfText

// field, align at 2.

if (Size < ((const char *)(&(AuxHeader->MaxAlignOfData)) -

It's not clear to me why if it is missing one or other of these that 2 is the right choice. Why is it not the other alignment value?

jhenderson: It's not clear to me why if it is missing one or other of these that 2 is the right choice. Why…

static void writeSymbolTable(raw_ostream &Out, object::Archive::Kind Kind, static void writeSymbolTable(raw_ostream &Out, object::Archive::Kind Kind,

bool Deterministic, ArrayRef<MemberData> Members, bool Deterministic, ArrayRef<MemberData> Members,

jhendersonUnsubmitted

Done

Please use static_cast or reinterpret_cast, not C-style casts. That being said, I'm struggling to follow the logic here. Is this essentially testing if the Header is too small to contain the MaxAlignOfData field? If so, is that actually a permitted case? The + 2 in particular is throwing me off though.

jhenderson: Please use static_cast or reinterpret_cast, not C-style casts. That being said, I'm struggling…

StringRef StringTable, uint64_t MembersOffset, StringRef StringTable, uint64_t MembersOffset,

jhendersonUnsubmitted

Done

return (*ObjOrErr)->is64Bit();

}

- // In AIX OS, if the member file is an XCOFF object file and has an auxiliary

- // header, the content of the member file need to be aligned at the

- // MAX(maximum alignment of .text , maximum alignment of .data).

+ // For the AIX Big Archive format, if a member file is an XCOFF object file and has an auxiliary

+ // header, the content of the member file needs to be aligned at

+ // MAX(maximum alignment of .text, maximum alignment of .data).

static uint32_t getMemberAlignment(const StringRef &ObjStringRef) {

This should be referring to the Big Archive format, right, not the OS?

Other suggestions in the inline edit.

jhenderson: This should be referring to the Big Archive format, right, not the OS? Other suggestions in…

stephenpeckhamUnsubmitted

Done

The alignment is not a requirement of the Big Archive format. It's required by the OS for 64-bit members and recommended for 32-bit members. AIX allows shared objects to be archive members. When archive members are loaded, they are mapped into memory. If the members aren't aligned properly in the archive, they won't be aligned in memory. Both .text and .data are mapped, so the required member alignment takes into account both the .text and .data alignment. Alignment is not necessary for members that are not loadable.

stephenpeckham: The alignment is not a requirement of the Big Archive format. It's required by the OS for 64…

unsigned NumSyms, uint64_t PrevMemberOffset = 0, unsigned NumSyms, uint64_t PrevMemberOffset = 0,

jhendersonUnsubmitted

Done

Two nits, and one more significant point.

No need for const & for StringRef, which is intended to be copied.
ObjStringRef -> Obj.
This function appears to be reading in the file and parsing it just to get the alignments. However, presumably this isn't the only place where we have the fully parsed object, since at some point you have to know what to write in the object file in the first place, right? Wouldn't it make more sense to identify and record this alignment then?

jhenderson: Two nits, and one more significant point. 1) No need for `const &` for `StringRef`, which is…

DiggerLinAuthorUnsubmitted

Done

yes

DiggerLin: yes

uint64_t NextMemberOffset = 0, uint64_t NextMemberOffset = 0,

jhendersonUnsubmitted

Done

return 2;

- // If the XCOFF object file have not a loader section, align at 2.

+ // If the XCOFF object file does not have a loader section, it is not loadable, so align at 2.

if (AuxHeader->SecNumOfLoader <= 0)

jhenderson:

jhendersonUnsubmitted

Done

Any reason you didn't adopt this comment suggestion I made?

(You can change "so align at 2." in my suggestion to "so align at the minimum value."

jhenderson: Any reason you didn't adopt this comment suggestion I made? (You can change "so align at 2."…

bool Is64Bit = false) { bool Is64Bit = false) {

// We don't write a symbol table on an archive with no members -- except on // We don't write a symbol table on an archive with no members -- except on

// Darwin, where the linker will abort unless the archive has a symbol table. // Darwin, where the linker will abort unless the archive has a symbol table.

if (StringTable.empty() && !isDarwin(Kind) && !isCOFFArchive(Kind)) if (StringTable.empty() && !isDarwin(Kind) && !isCOFFArchive(Kind))

return; return;

jhendersonUnsubmitted

Done

Seems like this should report the error, not just ignore it...

jhenderson: Seems like this should report the error, not just ignore it...

jhendersonUnsubmitted

Not Done

return 2;

- uint16_t AlignSize = AuxHeader->MaxAlignOfText > AuxHeader->MaxAlignOfData

- ? AuxHeader->MaxAlignOfText

- : AuxHeader->MaxAlignOfData;

+ uint16_t AlignSize = std::max(AuxHeader->MaxAlignOfText, AuxHeader->MaxAlignOfData);

return 1 << (AlignSize > 12 ? MaxAlignSize : AlignSize);

jhenderson:

uint64_t OffsetSize = is64BitKind(Kind) ? 8 : 4; uint64_t OffsetSize = is64BitKind(Kind) ? 8 : 4;

uint32_t Pad; uint32_t Pad;

jhendersonUnsubmitted

Done

This value of 12 is a magic number that is rather meaningless to a reader of the code. Please stick it in a named constant somewhere.

Also, is AlignSize (and MaxAlignSize) really an appropriate name for the variable? An alignment value isn't a size, so unless you are aligning a size field or something, it doesn't really make sense as a name.

This line also needs a comment explaining what it is doing and why.

jhenderson: This value of 12 is a magic number that is rather meaningless to a reader of the code. Please…

uint64_t Size = computeSymbolTableSize(Kind, NumSyms, OffsetSize, uint64_t Size = computeSymbolTableSize(Kind, NumSyms, OffsetSize,

StringTable.size(), &Pad); StringTable.size(), &Pad);

writeSymbolTableHeader(Out, Kind, Deterministic, Size, PrevMemberOffset, writeSymbolTableHeader(Out, Kind, Deterministic, Size, PrevMemberOffset,

NextMemberOffset); NextMemberOffset);

jhendersonUnsubmitted

Done

// If the desired alignment is > PAGESIZE, 32-bit members are aligned on a

- // word boundary, while 64-bit members are aligned on a PAGESIZE boundary

+ // word boundary, while 64-bit members are aligned on a PAGESIZE boundary.

return XCOFFObj->is64Bit()

jhenderson:

if (isBSDLike(Kind)) if (isBSDLike(Kind))

printNBits(Out, Kind, NumSyms * 2 * OffsetSize); printNBits(Out, Kind, NumSyms * 2 * OffsetSize);

stephenpeckhamUnsubmitted

Done

Only loadable objects need to be aligned. Onc requirement for a loadable module is the presence of a loader section. The o_snloader field in the auxiliary header can be checked.

stephenpeckham: Only loadable objects need to be aligned. Onc requirement for a loadable module is the presence…

else else

printNBits(Out, Kind, NumSyms); printNBits(Out, Kind, NumSyms);

uint64_t Pos = MembersOffset; uint64_t Pos = MembersOffset;

for (const MemberData &M : Members) { for (const MemberData &M : Members) {

if (isAIXBigArchive(Kind)) { if (isAIXBigArchive(Kind)) {

Expected<bool> Is64BitOrErr = is64BitSymbolicFile(M.Data); Pos += M.PreHeadPadSize;

jhendersonUnsubmitted

Done

No need for else after return.

jhenderson: No need for `else` after `return`.

// If there is an error, the error will have been emitted when if (is64BitSymbolicFile(M.SymFile.get()) != Is64Bit) {

// 'computeMemberData' called the 'getSymbol' function, so don't need to

// handle it here.

if (!Is64BitOrErr)

cantFail(Is64BitOrErr.takeError());

if (*Is64BitOrErr != Is64Bit) {

Pos += M.Header.size() + M.Data.size() + M.Padding.size(); Pos += M.Header.size() + M.Data.size() + M.Padding.size();

continue; continue;

} }

for (unsigned StringOffset : M.Symbols) { for (unsigned StringOffset : M.Symbols) {

if (isBSDLike(Kind)) if (isBSDLike(Kind))

printNBits(Out, Kind, StringOffset); printNBits(Out, Kind, StringOffset);

▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines if (!TripleStr)

return false; return false;

Triple T(*TripleStr); Triple T(*TripleStr);

return T.isWindowsArm64EC() || T.getArch() == Triple::x86_64; return T.isWindowsArm64EC() || T.getArch() == Triple::x86_64;

} }

return false; return false;

} }

static Expected<std::vector<unsigned>> static Expected<std::vector<unsigned>> getSymbols(SymbolicFile *Obj,

getSymbols(MemoryBufferRef Buf, uint16_t Index, raw_ostream &SymNames, uint16_t Index,

SymMap *SymMap, bool &HasObject) { raw_ostream &SymNames,

// In the scenario when LLVMContext is populated SymbolicFile will contain a SymMap *SymMap) {

// reference to it, thus SymbolicFile should be destroyed first.

LLVMContext Context;

std::vector<unsigned> Ret; std::vector<unsigned> Ret;

Expected<std::unique_ptr<SymbolicFile>> ObjOrErr =

getSymbolicFile(Buf, Context);

if (!ObjOrErr)

return ObjOrErr.takeError();

// If the member is non-symbolic file, treat it as having no symbols. if (Obj == nullptr)

if (!*ObjOrErr)

return Ret; return Ret;

jhendersonUnsubmitted

Done

Could you please move this back to where it was (immediately before the if (SymMap). There's no reason to move it, and by moving it you've left it further from its point of use than it needs to be.

jhenderson: Could you please move this back to where it was (immediately before the `if (SymMap)`. There's…

std::unique_ptr<object::SymbolicFile> Obj = std::move(*ObjOrErr);

std::map<std::string, uint16_t> *Map = nullptr; std::map<std::string, uint16_t> *Map = nullptr;

if (SymMap) if (SymMap)

Map = SymMap->UseECMap && isECObject(*Obj) ? &SymMap->ECMap : &SymMap->Map; Map = SymMap->UseECMap && isECObject(*Obj) ? &SymMap->ECMap : &SymMap->Map;

HasObject = true;

for (const object::BasicSymbolRef &S : Obj->symbols()) { for (const object::BasicSymbolRef &S : Obj->symbols()) {

if (!isArchiveSymbol(S)) if (!isArchiveSymbol(S))

continue; continue;

if (Map) { if (Map) {

std::string Name; std::string Name;

raw_string_ostream NameStream(Name); raw_string_ostream NameStream(Name);

if (Error E = S.printName(NameStream)) if (Error E = S.printName(NameStream))

return std::move(E); return std::move(E);

Show All 13 Lines static Expected<std::vector<unsigned>> getSymbols(SymbolicFile *Obj,

} }

return Ret; return Ret;

} }

static Expected<std::vector<MemberData>> static Expected<std::vector<MemberData>>

computeMemberData(raw_ostream &StringTable, raw_ostream &SymNames, computeMemberData(raw_ostream &StringTable, raw_ostream &SymNames,

object::Archive::Kind Kind, bool Thin, bool Deterministic, object::Archive::Kind Kind, bool Thin, bool Deterministic,

SymtabWritingMode NeedSymbols, SymMap *SymMap, SymtabWritingMode NeedSymbols, SymMap *SymMap,

ArrayRef<NewArchiveMember> NewMembers) { LLVMContext &Context, ArrayRef<NewArchiveMember> NewMembers) {

static char PaddingData[8] = {'\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n'}; static char PaddingData[8] = {'\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n'};

uint64_t MemHeadPadSize = 0;

jhendersonUnsubmitted

Done

We've had this sort of discussion before: you can't use static local variables like this that will change each time this function is called, because this function could be called from multiple threads.

jhenderson: We've had this sort of discussion before: you can't use static local variables like this that…

DiggerLinAuthorUnsubmitted

Done

thanks for let me know.

DiggerLin: thanks for let me know.

uint64_t Pos = uint64_t Pos =

isAIXBigArchive(Kind) ? sizeof(object::BigArchive::FixLenHdr) : 0; isAIXBigArchive(Kind) ? sizeof(object::BigArchive::FixLenHdr) : 0;

std::vector<MemberData> Ret; std::vector<MemberData> Ret;

bool HasObject = false; bool HasObject = false;

// Deduplicate long member names in the string table and reuse earlier name // Deduplicate long member names in the string table and reuse earlier name

// offsets. This especially saves space for COFF Import libraries where all // offsets. This especially saves space for COFF Import libraries where all

▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines computeMemberData(raw_ostream &StringTable, raw_ostream &SymNames,

std::map<StringRef, unsigned> FilenameCount; std::map<StringRef, unsigned> FilenameCount;

if (UniqueTimestamps) { if (UniqueTimestamps) {

for (const NewArchiveMember &M : NewMembers) for (const NewArchiveMember &M : NewMembers)

FilenameCount[M.MemberName]++; FilenameCount[M.MemberName]++;

for (auto &Entry : FilenameCount) for (auto &Entry : FilenameCount)

Entry.second = Entry.second > 1 ? 1 : 0; Entry.second = Entry.second > 1 ? 1 : 0;

} }

// The big archive format needs to know the offset of the previous member // The big archive format needs to know the offset of the previous member

jhendersonUnsubmitted

Done

Why has this comment changed only to ADD a typo??

jhenderson: Why has this comment changed only to ADD a typo??

// header. // header.

uint64_t PrevOffset = 0; uint64_t PrevOffset = 0;

uint64_t NextMemHeadPadSize = 0;

std::unique_ptr<SymbolicFile> CurSymFile;

std::unique_ptr<SymbolicFile> NextSymFile;

uint16_t Index = 0; uint16_t Index = 0;

jhendersonUnsubmitted

Done

Why has this variable been renamed?

jhenderson: Why has this variable been renamed?

DiggerLinAuthorUnsubmitted

Done

I think "Pre" is more concise for the abbreviation of previous than "Prev" , I search from internet , the "Prev" is better , I change back.

DiggerLin: I think "Pre" is more concise for the abbreviation of previous than "Prev" , I search from…

for (const NewArchiveMember &M : NewMembers) {

for (auto M = NewMembers.begin(); M < NewMembers.end(); ++M) {

std::string Header; std::string Header;

raw_string_ostream Out(Header); raw_string_ostream Out(Header);

jhendersonUnsubmitted

Done

std::unique_ptr<SymbolicFile> NextSymFile;

- for (auto M = NewMembers.begin(); M < NewMembers.end(); M++) {

+ for (auto M = NewMembers.begin(); M < NewMembers.end(); ++M) {

std::string Header;

Please re-review the LLVM coding standards. You keep making mistakes, despite being told before, that are covered in the coding standards.

jhenderson: Please re-review the LLVM coding standards. You keep making mistakes, despite being told before…

DiggerLinAuthorUnsubmitted

Done

sorry for careless, actually I know that ++M is required by LLVM coding standard.

DiggerLin: sorry for careless, actually I know that ++M is required by LLVM coding standard.

MemoryBufferRef Buf = M.Buf->getMemBufferRef(); MemoryBufferRef Buf = M->Buf->getMemBufferRef();

jhendersonUnsubmitted

Done

I'm not sure I understand the changes to this loop. Why do you need to know anything about the next member to know how much padding the current member requires?

jhenderson: I'm not sure I understand the changes to this loop. Why do you need to know anything about the…

DiggerLinAuthorUnsubmitted

Done

there is field "char ar_nxtmem[20]" /* Next member offset-decimal */"
in the "File Member Header" in big archive
, we need to calculate the padding before the header if there is.

DiggerLin: there is field "char ar_nxtmem[20]" /* Next member offset-decimal */" in the "File Member…

StringRef Data = Thin ? "" : Buf.getBuffer(); StringRef Data = Thin ? "" : Buf.getBuffer();

Index++; Index++;

// ld64 expects the members to be 8-byte aligned for 64-bit content and at // ld64 expects the members to be 8-byte aligned for 64-bit content and at

// least 4-byte aligned for 32-bit content. Opt for the larger encoding // least 4-byte aligned for 32-bit content. Opt for the larger encoding

// uniformly. This matches the behaviour with cctools and ensures that ld64 // uniformly. This matches the behaviour with cctools and ensures that ld64

// is happy with archives that we generate. // is happy with archives that we generate.

unsigned MemberPadding = unsigned MemberPadding =

isDarwin(Kind) ? offsetToAlignment(Data.size(), Align(8)) : 0; isDarwin(Kind) ? offsetToAlignment(Data.size(), Align(8)) : 0;

unsigned TailPadding = unsigned TailPadding =

offsetToAlignment(Data.size() + MemberPadding, Align(2)); offsetToAlignment(Data.size() + MemberPadding, Align(2));

StringRef Padding = StringRef(PaddingData, MemberPadding + TailPadding); StringRef Padding = StringRef(PaddingData, MemberPadding + TailPadding);

sys::TimePoint<std::chrono::seconds> ModTime; sys::TimePoint<std::chrono::seconds> ModTime;

if (UniqueTimestamps) if (UniqueTimestamps)

// Increment timestamp for each file of a given name. // Increment timestamp for each file of a given name.

ModTime = sys::toTimePoint(FilenameCount[M.MemberName]++); ModTime = sys::toTimePoint(FilenameCount[M->MemberName]++);

else else

ModTime = M.ModTime; ModTime = M->ModTime;

uint64_t Size = Buf.getBufferSize() + MemberPadding; uint64_t Size = Buf.getBufferSize() + MemberPadding;

if (Size > object::Archive::MaxMemberSize) { if (Size > object::Archive::MaxMemberSize) {

std::string StringMsg = std::string StringMsg =

"File " + M.MemberName.str() + " exceeds size limit"; "File " + M->MemberName.str() + " exceeds size limit";

return make_error<object::GenericBinaryError>( return make_error<object::GenericBinaryError>(

std::move(StringMsg), object::object_error::parse_failed); std::move(StringMsg), object::object_error::parse_failed);

} }

if (NeedSymbols != SymtabWritingMode::NoSymtab || isAIXBigArchive(Kind)) {

auto SetNextSymFile = [&NextSymFile,

jhendersonUnsubmitted

Done

std::move(StringMsg), object::object_error::parse_failed);

}

- // In big archive, we need to calculate and include next member offset

- // and previous member offset in file member header.

+ // In the big archive file format, we need to calculate and include the next

+ // and previous member offset in the file member header.

if (isAIXBigArchive(Kind)) {

(and make sure the comment respects the column limit properly - I haven't attempted to with my edit)

jhenderson: (and make sure the comment respects the column limit properly - I haven't attempted to with my…

&Context](MemoryBufferRef Buf,

StringRef MemberName) -> Error {

Expected<std::unique_ptr<SymbolicFile>> SymFileOrErr =

getSymbolicFile(Buf, Context);

if (!SymFileOrErr)

return createFileError(MemberName, SymFileOrErr.takeError());

NextSymFile = std::move(*SymFileOrErr);

jhendersonUnsubmitted

Done

No need for the braces here.

jhenderson: No need for the braces here.

return Error::success();

};

if (M == NewMembers.begin())

if (Error Err = SetNextSymFile(Buf, M->MemberName))

return std::move(Err);

jhendersonUnsubmitted

Done

Test case needed.

jhenderson: Test case needed.

DiggerLinAuthorUnsubmitted

Done

we just refactor the code here, not adding a new functionality here, I do not think we need a new test case here.

DiggerLin: we just refactor the code here, not adding a new functionality here, I do not think we need a…

jhendersonUnsubmitted

Done

Although I agree that the code to load a symbolic file is just a refactoring of existing code, this use of that code is a new call site, where an error needs checking in a test. If there's an existing test that hits this specific piece of code, then that's fine, but if there isn't, it needs a new test. Otherwise, there would be no test that shows that the error is properly handled in this case.

Same goes below.

jhenderson: Although I agree that the code to load a symbolic file is just a refactoring of existing code…

DiggerLinAuthorUnsubmitted

Done

there is existing test for the code when you implement the https://reviews.llvm.org/D88288
https://github.com/llvm/llvm-project/blob/main/llvm/test/Object/archive-malformed-object.test#L13
https://github.com/llvm/llvm-project/blob/main/llvm/test/Object/archive-malformed-object.test#L19

DiggerLin: there is existing test for the code when you implement the https://reviews.llvm.org/D88288…

jhendersonUnsubmitted

Done

Okay, thanks. I agree that this check on this line is covered by the existing test. However, archive-malformed-object.test doesn't cover the case where a second or later member is bad, I believe, which means that the check and report of the error at lines 841-843 in this current version of the patch aren't covered still.

Also, that test only covers it for the case where symbol tables are needed (i.e. when llvm-ar without the "S" option is used). Ideally, you would also have a test case that shows that when no symbols are requested (i.e. llvm-ar S<more options> test.a xcoff.o), an error is also reported for an invalid member of a big archive.

You could cover both missing cases sufficiently by adding the following two test cases somewhere:

llvm-ar rcS archive.a bad.o
llvm-ar rcS archive.a good.o bad.o

where archive.a is a big archive, bad.o is a file that will cause getSymbolicFile to fail, and good.o is a file that it will work with. (Obviously you'd rename these files in the real test)

jhenderson: Okay, thanks. I agree that this check on this line is covered by the existing test. However…

DiggerLinAuthorUnsubmitted

Done

Okay, thanks. I agree that this check on this line is covered by the existing test. However, archive-malformed-object.test doesn't cover the case where a second or later member is bad, I believe, which means that the check and report of the error at lines 841-843 in this current version of the patch aren't covered still.

I will add a new test scenario to cover it.

Ideally, you would also have a test case that shows that when no symbols are requested (i.e. llvm-ar S<more options> test.a xcoff.o), an error is also reported for an invalid member of a big archive.

what is the purpose the test ? do you want to test whether the code go into the following block
when isAIXBigArchive(Kind) of if (NeedSymbols != SymtabWritingMode::NoSymtab || isAIXBigArchive(Kind)) is true, Or test the functionality of the following block(we already has test scenario for it)

{auto SetNextSymFile = [&NextSymFile,
    .....
    if ((M + 1) != NewMembers.end())
        if (Error Err = SetNextSymFile((M + 1)->Buf->getMemBufferRef(),
                                       (M + 1)->MemberName))
          return std::move(Err);
    }

if we want to check whether going into above block, I think the align functionality will not work without the code go into the above block. we already has test scenario to test the align of big archive. I add the test scenario anyway.

DiggerLin: > Okay, thanks. I agree that this check on this line is covered by the existing test. However…

jhendersonUnsubmitted

Not Done

what is the purpose the test ?

Unlike the last few tests I've requested, which test very specific code paths, this is more of a high-level-thinking test (often called a "black box test"), i.e. one where we don't know what the details of the code are. Specifically, the aim of the test is to show that "For Big Archive, if an input object can't be loaded, we correctly report an error." Such high-level tests are useful in addition to the low-level coverage tests, because they are less likely to be invalidated by subsequent changes to the code (for example, if somebody changed this code path to be big archive only, because they decide to handle symbol tables differently, some of the test coverage provided by the existing test would no longer cover this area of code).

High-level tests can often provide low-level coverage at the same time, and similarly low-level coverage tests can cover high-level topics at the same time (e.g. the other test you've recently modified covers "For archives that require a symbol table, if an input object can't be loaded, we correctly report an error"). A consequence of this dual-purpose is that sometimes there is overlap in terms of raw coverage that tests provide, as you've seen here.

jhenderson: > what is the purpose the test ? Unlike the last few tests I've requested, which test very…

CurSymFile = std::move(NextSymFile);

if ((M + 1) != NewMembers.end())

if (Error Err = SetNextSymFile((M + 1)->Buf->getMemBufferRef(),

(M + 1)->MemberName))

return std::move(Err);

jhendersonUnsubmitted

Done

Test case needed.

jhenderson: Test case needed.

DiggerLinAuthorUnsubmitted

Done

some reason as above.

DiggerLin: some reason as above.

}

// In the big archive file format, we need to calculate and include the next

// member offset and previous member offset in the file member header.

if (isAIXBigArchive(Kind)) { if (isAIXBigArchive(Kind)) {

uint64_t OffsetToMemData = Pos + sizeof(object::BigArMemHdrType) +

alignTo(M->MemberName.size(), 2);

if (M == NewMembers.begin())

NextMemHeadPadSize =

alignToPowerOf2(OffsetToMemData,

getMemberAlignment(CurSymFile.get())) -

OffsetToMemData;

MemHeadPadSize = NextMemHeadPadSize;

Pos += MemHeadPadSize;

uint64_t NextOffset = Pos + sizeof(object::BigArMemHdrType) + uint64_t NextOffset = Pos + sizeof(object::BigArMemHdrType) +

alignTo(M.MemberName.size(), 2) + alignTo(Size, 2); alignTo(M->MemberName.size(), 2) + alignTo(Size, 2);

jhendersonUnsubmitted

Done

Some more comments explaining why you're doing what you're doing in this block would be good.

jhenderson: Some more comments explaining why you're doing what you're doing in this block would be good.

printBigArchiveMemberHeader(Out, M.MemberName, ModTime, M.UID, M.GID,

M.Perms, Size, PrevOffset, NextOffset); // If there is another member file after this, we need to calculate the

// padding before the header.

jhendersonUnsubmitted

Done

alignTo(M->MemberName.size(), 2) + alignTo(Size, 2);

- // If there is next member file. we need to calculate the padding before

- // the header if there is.

+ // If there is another member file after this, we need to calculate the padding before

+ // the header.

if ((M + 1) != NewMembers.end()) {

jhenderson:

if ((M + 1) != NewMembers.end()) {

uint64_t OffsetToNextMemData = NextOffset +

sizeof(object::BigArMemHdrType) +

alignTo((M + 1)->MemberName.size(), 2);

NextMemHeadPadSize =

alignToPowerOf2(OffsetToNextMemData,

getMemberAlignment(NextSymFile.get())) -

OffsetToNextMemData;

NextOffset += NextMemHeadPadSize;

}

printBigArchiveMemberHeader(Out, M->MemberName, ModTime, M->UID, M->GID,

M->Perms, Size, PrevOffset, NextOffset);

PrevOffset = Pos; PrevOffset = Pos;

} else { } else {

printMemberHeader(Out, Pos, StringTable, MemberNames, Kind, Thin, M, printMemberHeader(Out, Pos, StringTable, MemberNames, Kind, Thin, *M,

stephenpeckhamUnsubmitted

Done

It's possible to have a loadable object with a very large text or data alignment. A sanity check would be useful here. The AIX ar command caps the alignment at 2^12 (the typical PAGESIZE on an AIX system).

stephenpeckham: It's possible to have a loadable object with a very large text or data alignment. A sanity…

DiggerLinAuthorUnsubmitted

Done

the pos points to the begin of the header of member file, it not be aligned, only the data of the member file is aligned.

DiggerLin: the pos points to the begin of the header of member file, it not be aligned, only the data of…

ModTime, Size); ModTime, Size);

jhendersonUnsubmitted

Done

Is this value correct for the last member in an archive?

jhenderson: Is this value correct for the last member in an archive?

DiggerLinAuthorUnsubmitted

Done

yes, the value is correct. the NextOffset of last file member will be point to "Member Table" . and there is not a special requirement of content of "Member Table"

DiggerLin: yes, the value is correct. the NextOffset of last file member will be point to "Member Table"…

} }

Out.flush(); Out.flush();

std::vector<unsigned> Symbols; std::vector<unsigned> Symbols;

if (NeedSymbols != SymtabWritingMode::NoSymtab) { if (NeedSymbols != SymtabWritingMode::NoSymtab) {

Expected<std::vector<unsigned>> SymbolsOrErr = Expected<std::vector<unsigned>> SymbolsOrErr =

getSymbols(Buf, Index, SymNames, SymMap, HasObject); getSymbols(CurSymFile.get(), Index, SymNames, SymMap);

if (!SymbolsOrErr) if (!SymbolsOrErr)

jhendersonUnsubmitted

Done

If I'm not mistaken, the implication of this code is that you will no longer be able to store non-symbolic-files in BigArchives. Is that intended (and if so, is it correct)? For other formats, you can store non-symbolic files as archive members. They just don't contribute anything to the symbol table.

jhenderson: If I'm not mistaken, the implication of this code is that you will no longer be able to store…

DiggerLinAuthorUnsubmitted

Done

in the function

static Expected<std::unique_ptr<SymbolicFile>>
getSymbolFile(MemoryBufferRef Buf, LLVMContext &Context) {
  std::unique_ptr<object::SymbolicFile> Obj;
  const file_magic Type = identify_magic(Buf.getBuffer());
  // Treat non symbolic file types as nullptr.
  if (!object::SymbolicFile::isSymbolicFile(Type, &Context))
    return nullptr;

it treats non symbolic file types as nullptr. it do not return a error.

DiggerLin: in the function ``` static Expected<std::unique_ptr<SymbolicFile>> getSymbolFile…

return createFileError(M.MemberName, SymbolsOrErr.takeError()); return createFileError(M->MemberName, SymbolsOrErr.takeError());

Symbols = std::move(*SymbolsOrErr); Symbols = std::move(*SymbolsOrErr);

if (CurSymFile)

jhendersonUnsubmitted

Done

If I'm not mistaken, you could avoid a lot of this duplicate logic about the SymbolicFile by pulling it out of the isAIXBigArchive code. Something like this outline:

for (auto M = NewMembers.begin(); M < NewMembers.end(); ++N) {
  if (NeedSymbols || isAIXBigArchive(Kind)) {
    if (M == NewMembers.begin() {
      CurSymFile = // load symbolic file for first member
    } else {
      CurSymFile = std::move(NextSymFile);
    }
    if (M + 1 != NewMembers.end()) {
      NextSymFile = // load symbolic file for next member
    }

    // Do all of loop logic
  }
}

You'll need to handle non-symbolic files somehow, perhaps by just resetting the relevant std::unique_ptr to leave it empty, and then checking whether the pointer is empty before trying to read it.

jhenderson: If I'm not mistaken, you could avoid a lot of this duplicate logic about the SymbolicFile by…

DiggerLinAuthorUnsubmitted

Done

in the if (M == NewMembers.begin()) , it not only open CurSymFile but also calculate MemHeadPadSize

in the if ((M + 1) != NewMembers.end()), it not only open CurSymFile but also calculate NextMemHeadPadSize but also calculate the NextMemHeadPadSize

even if I do as your suggestion, I still need to if (M == NewMembers.begin()) and if ((M + 1) != NewMembers.end()) for MemHeadPadSize and NextMemHeadPadSize later

You'll need to handle non-symbolic files somehow, perhaps by just resetting the relevant std::unique_ptr to leave it empty, and then checking whether the pointer is empty before trying to read it.

function getSymbolFile already return nullptr for non-symbolic files.

DiggerLin: in the if (M == NewMembers.begin()) , it not only open CurSymFile but also calculate…

jhendersonUnsubmitted

Done

My complaint is that structurally, this and the above block are largely the same. Yes, one does a bit more than the other, but ultimately, the two are structurally identical.

You could refactor the code to avoid this structural duplication, without too much difficulty, using my suggested code above. To do the extra work regarding calculate the padding sizes, simply add "if (isAIXBigArchive(Kind)) clauses:

for (auto M = NewMembers.begin(); M < NewMembers.end(); ++M) {
  if (NeedSymbols || isAIXBigArchive(Kind)) {
    if (M == NewMembers.begin() {
      CurSymFile = // load symbolic file for first member
      if (isAIXBigArchive(Kind)) {
        // Do stuff with padding, as needed.
      }
    } else {
      CurSymFile = std::move(NextSymFile);
    }
    if (M + 1 != NewMembers.end()) {
      NextSymFile = // load symbolic file for next member
      if (isAIXBigArchive(Kind)) {
        // Do stuff with padding, as needed.
      }
    }

    // Do all of loop logic (or do some before the ifs and some after).
  }
}

jhenderson: My complaint is that structurally, this and the above block are largely the same. Yes, one does…

DiggerLinAuthorUnsubmitted

Done

if I change as your suggestion,
it only reduce the code of

if (!isAIXBigArchive(Kind)) {
       Expected<std::unique_ptr<SymbolicFile>> SymFileOrErr =
           getSymbolFile(Buf, Context);
       if (!SymFileOrErr)
         return createFileError(M->MemberName, SymFileOrErr.takeError());
       CurSymFile = std::move(*SymFileOrErr);
     }

I still need to put variables NextOffset and OffsetToMemData under the if (NeedSymbols || isAIXBigArchive(Kind))

and it will several

if (isAIXBigArchive(Kind)) under if (NeedSymbols || isAIXBigArchive(Kind)) {

we will have duplication code of
printMemberHeader(Out, Pos, StringTable, MemberNames, Kind, Thin, *M, ModTime, Size);

 if (NeedSymbols || isAIXBigArchive(Kind)) {
   ...
  if(!isAIXBigArchive(Kind)) 
    printMemberHeader(Out, Pos, StringTable, MemberNames, Kind, Thin, *M, ModTime, Size);
}
 else {
     printMemberHeader(Out, Pos, StringTable, MemberNames, Kind, Thin, *M, ModTime, Size);
  }

I do not think the logic of your suggestion is clearer than the current code. If you strong suggest , I can change as your suggestion.

DiggerLin: if I change as your suggestion, it only reduce the code of ``` if (!isAIXBigArchive…

jhendersonUnsubmitted

Done

The changes I made were as follows:

for (...) {
  ... after the existing file size limit check ...

  if (NeedSymbols || isAIXBigArchive(Kind)) {
    auto SetNextSymFile = [&NextSymFile,
                           &Context](MemoryBufferRef Buf,
                                     StringRef MemberName) -> Error {
      Expected<std::unique_ptr<SymbolicFile>> SymFileOrErr =
          getSymbolicFile(Buf, Context);
      if (!SymFileOrErr) {
        return createFileError(MemberName, SymFileOrErr.takeError());
      }
      NextSymFile = std::move(*SymFileOrErr);
      return Error::success();
    };

    if (M == NewMembers.begin())
      if (Error Err = SetNextSymFile(Buf, M->MemberName))
        return std::move(Err);
    CurSymFile = std::move(NextSymFile);

    if (M + 1 != NewMembers.end())
      if (Error Err = SetNextSymFile((M + 1)->Buf->getMemBufferRef(), (M + 1)->MemberName))
        return std::move(Err);
  }

  if (isAIXBigArchive(Kind)) {
    uint64_t OffsetToMemData = Pos + sizeof(object::BigArMemHdrType) +
                               alignTo(M->MemberName.size(), 2);

    if (M == NewMembers.begin()) {
      MemHeadPadSize = alignToPowerOf2(OffsetToMemData,
                                       getMemberAlignment(CurSymFile.get())) -
                       OffsetToMemData;
    } else {
      MemHeadPadSize = NextMemHeadPadSize;
    }

    ... update Pos, calculate NextOffset etc as before, then write header ...
  } else {
    printMemberHeader(...);
  }
  Out.flush();

  std::vector<unsigned> Symbols;
  if (NeedSymbols) {
    Expected<std::vector<unsigned>> SymbolsOrErr =
        getSymbols(CurSymFile.get(), Index, SymNames, SymMap);
    if (!SymbolsOrErr)
      return createFileError(M->MemberName, SymbolsOrErr.takeError());
    Symbols = std::move(*SymbolsOrErr);
    if (CurSymFile) // Can we remove this if check and set HasObject to true where CurSymFile is set?
      HasObject = true;
  }

  ... update Pos and Ret ...
}

Aside: is the check on CurSymFile for the HasObject setting necessary? This potentially could be simplified, but I haven't got the time to follow this logic.

jhenderson: Either you've misunderstood my suggestion or I've missed something. I'm pretty sure I haven't…

DiggerLinAuthorUnsubmitted

Done

thanks , I follow your suggestion.

DiggerLin: thanks , I follow your suggestion.

HasObject = true;

jhendersonUnsubmitted

Done

Did you see my aside about HasObject at the end of my long inline comment? Can the logic around HasObject be simplified in the latest version. I've not followed it enough to be confident either way.

jhenderson: Did you see my aside about HasObject at the end of my long inline comment? Can the logic around…

DiggerLinAuthorUnsubmitted

Done

Can the logic around HasObject be simplified in the latest version.

I do not think HasObject be simplified, in the loop

   for (auto M = NewMembers.begin(); M < NewMembers.end(); ++M) {  
.....
    if (CurSymFile)
       HasObject = true;
 ....
}
.....
   if (HasObject && SymNames.tell() == 0 && !isCOFFArchive(Kind))
   SymNames << '\0' << '\0' << '\0';
 return Ret;

if any of member is symbolicFile and SymName is empty and !isCOFFArchive(Kind)) , it will

SymNames << '\0' << '\0' << '\0';

I do not think we can simplify it.

DiggerLin: > Can the logic around HasObject be simplified in the latest version. I do not think…

} }

Pos += Header.size() + Data.size() + Padding.size(); Pos += Header.size() + Data.size() + Padding.size();

Ret.push_back({std::move(Symbols), std::move(Header), Data, Padding}); Ret.push_back({std::move(Symbols), std::move(Header), Data, Padding,

MemHeadPadSize, std::move(CurSymFile)});

} }

// If there are no symbols, emit an empty symbol table, to satisfy Solaris // If there are no symbols, emit an empty symbol table, to satisfy Solaris

// tools, older versions of which expect a symbol table in a non-empty // tools, older versions of which expect a symbol table in a non-empty

// archive, regardless of whether there are any symbols in it. // archive, regardless of whether there are any symbols in it.

if (HasObject && SymNames.tell() == 0 && !isCOFFArchive(Kind)) if (HasObject && SymNames.tell() == 0 && !isCOFFArchive(Kind))

SymNames << '\0' << '\0' << '\0'; SymNames << '\0' << '\0' << '\0';

return Ret; return Ret;

} }

▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines static Error writeArchiveToStream(raw_ostream &Out,

raw_svector_ostream StringTable(StringTableBuf); raw_svector_ostream StringTable(StringTableBuf);

SymMap SymMap; SymMap SymMap;

// COFF symbol map uses 16-bit indexes, so we can't use it if there are too // COFF symbol map uses 16-bit indexes, so we can't use it if there are too

// many members. // many members.

if (isCOFFArchive(Kind) && NewMembers.size() > 0xfffe) if (isCOFFArchive(Kind) && NewMembers.size() > 0xfffe)

Kind = object::Archive::K_GNU; Kind = object::Archive::K_GNU;

// In the scenario when LLVMContext is populated SymbolicFile will contain a

// reference to it, thus SymbolicFile should be destroyed first.

LLVMContext Context;

SymMap.UseECMap = IsEC; SymMap.UseECMap = IsEC;

Expected<std::vector<MemberData>> DataOrErr = computeMemberData( Expected<std::vector<MemberData>> DataOrErr = computeMemberData(

StringTable, SymNames, Kind, Thin, Deterministic, WriteSymtab, StringTable, SymNames, Kind, Thin, Deterministic, WriteSymtab,

isCOFFArchive(Kind) ? &SymMap : nullptr, NewMembers); isCOFFArchive(Kind) ? &SymMap : nullptr, Context, NewMembers);

if (Error E = DataOrErr.takeError()) if (Error E = DataOrErr.takeError())

return E; return E;

std::vector<MemberData> &Data = *DataOrErr; std::vector<MemberData> &Data = *DataOrErr;

uint64_t StringTableSize = 0; uint64_t StringTableSize = 0;

MemberData StringTableMember; MemberData StringTableMember;

if (!StringTableBuf.empty() && !isAIXBigArchive(Kind)) { if (!StringTableBuf.empty() && !isAIXBigArchive(Kind)) {

StringTableMember = computeStringTable(StringTableBuf); StringTableMember = computeStringTable(StringTableBuf);

StringTableSize = StringTableMember.Header.size() + StringTableSize = StringTableMember.Header.size() +

StringTableMember.Data.size() + StringTableMember.Data.size() +

StringTableMember.Padding.size(); StringTableMember.Padding.size();

} }

// We would like to detect if we need to switch to a 64-bit symbol table. // We would like to detect if we need to switch to a 64-bit symbol table.

uint64_t LastMemberEndOffset = 0; uint64_t LastMemberEndOffset = 0;

uint64_t LastMemberHeaderOffset = 0; uint64_t LastMemberHeaderOffset = 0;

uint64_t NumSyms = 0; uint64_t NumSyms = 0;

uint64_t NumSyms32 = 0; // Store symbol number of 32-bit member files. uint64_t NumSyms32 = 0; // Store symbol number of 32-bit member files.

bool ShouldWriteSymtab = WriteSymtab != SymtabWritingMode::NoSymtab; bool ShouldWriteSymtab = WriteSymtab != SymtabWritingMode::NoSymtab;

for (const auto &M : Data) { for (const auto &M : Data) {

// Record the start of the member's offset // Record the start of the member's offset

LastMemberEndOffset += M.PreHeadPadSize;

LastMemberHeaderOffset = LastMemberEndOffset; LastMemberHeaderOffset = LastMemberEndOffset;

// Account for the size of each part associated with the member. // Account for the size of each part associated with the member.

LastMemberEndOffset += M.Header.size() + M.Data.size() + M.Padding.size(); LastMemberEndOffset += M.Header.size() + M.Data.size() + M.Padding.size();

NumSyms += M.Symbols.size(); NumSyms += M.Symbols.size();

// AIX big archive files may contain two global symbol tables. The // AIX big archive files may contain two global symbol tables. The

// first global symbol table locates 32-bit file members that define global // first global symbol table locates 32-bit file members that define global

// symbols; the second global symbol table does the same for 64-bit file // symbols; the second global symbol table does the same for 64-bit file

// members. As a big archive can have both 32-bit and 64-bit file members, // members. As a big archive can have both 32-bit and 64-bit file members,

// we need to know the number of symbols in each symbol table individually. // we need to know the number of symbols in each symbol table individually.

if (isAIXBigArchive(Kind) && ShouldWriteSymtab) { if (isAIXBigArchive(Kind) && ShouldWriteSymtab) {

Expected<bool> Is64BitOrErr = is64BitSymbolicFile(M.Data); if (!is64BitSymbolicFile(M.SymFile.get()))

if (Error E = Is64BitOrErr.takeError())

return E;

if (!*Is64BitOrErr)

NumSyms32 += M.Symbols.size(); NumSyms32 += M.Symbols.size();

} }

std::optional<uint64_t> HeadersSize; std::optional<uint64_t> HeadersSize;

// The symbol table is put at the end of the big archive file. The symbol // The symbol table is put at the end of the big archive file. The symbol

// table is at the start of the archive file for other archive formats. // table is at the start of the archive file for other archive formats.

if (ShouldWriteSymtab && !is64BitKind(Kind)) { if (ShouldWriteSymtab && !is64BitKind(Kind)) {

// We assume 32-bit offsets to see if 32-bit symbols are possible or not. // We assume 32-bit offsets to see if 32-bit symbols are possible or not.

▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines if (!isAIXBigArchive(Kind)) {

uint64_t MemberTableNameStrTblSize = 0; uint64_t MemberTableNameStrTblSize = 0;

std::vector<size_t> MemberOffsets; std::vector<size_t> MemberOffsets;

std::vector<StringRef> MemberNames; std::vector<StringRef> MemberNames;

// Loop across object to find offset and names. // Loop across object to find offset and names.

uint64_t MemberEndOffset = sizeof(object::BigArchive::FixLenHdr); uint64_t MemberEndOffset = sizeof(object::BigArchive::FixLenHdr);

for (size_t I = 0, Size = NewMembers.size(); I != Size; ++I) { for (size_t I = 0, Size = NewMembers.size(); I != Size; ++I) {

const NewArchiveMember &Member = NewMembers[I]; const NewArchiveMember &Member = NewMembers[I];

MemberTableNameStrTblSize += Member.MemberName.size() + 1; MemberTableNameStrTblSize += Member.MemberName.size() + 1;

MemberEndOffset += Data[I].PreHeadPadSize;

MemberOffsets.push_back(MemberEndOffset); MemberOffsets.push_back(MemberEndOffset);

MemberNames.push_back(Member.MemberName); MemberNames.push_back(Member.MemberName);

// File member name ended with "`\n". The length is included in // File member name ended with "`\n". The length is included in

// BigArMemHdrType. // BigArMemHdrType.

MemberEndOffset += sizeof(object::BigArMemHdrType) + MemberEndOffset += sizeof(object::BigArMemHdrType) +

alignTo(Data[I].Data.size(), 2) + alignTo(Data[I].Data.size(), 2) +

alignTo(Member.MemberName.size(), 2); alignTo(Member.MemberName.size(), 2);

} }

// AIX member table size. // AIX member table size.

uint64_t MemberTableSize = 20 + // Number of members field uint64_t MemberTableSize = 20 + // Number of members field

20 * MemberOffsets.size() + 20 * MemberOffsets.size() +

MemberTableNameStrTblSize; MemberTableNameStrTblSize;

SmallString<0> SymNamesBuf32; SmallString<0> SymNamesBuf32;

SmallString<0> SymNamesBuf64; SmallString<0> SymNamesBuf64;

raw_svector_ostream SymNames32(SymNamesBuf32); raw_svector_ostream SymNames32(SymNamesBuf32);

raw_svector_ostream SymNames64(SymNamesBuf64); raw_svector_ostream SymNames64(SymNamesBuf64);

if (ShouldWriteSymtab && NumSyms) if (ShouldWriteSymtab && NumSyms)

// Generate the symbol names for the members. // Generate the symbol names for the members.

for (const NewArchiveMember &M : NewMembers) { for (const auto &M : Data) {

MemoryBufferRef Buf = M.Buf->getMemBufferRef(); Expected<std::vector<unsigned>> SymbolsOrErr = getSymbols(

Expected<bool> Is64BitOrErr = is64BitSymbolicFile(Buf.getBuffer()); M.SymFile.get(), 0,

if (!Is64BitOrErr) is64BitSymbolicFile(M.SymFile.get()) ? SymNames64 : SymNames32,

return Is64BitOrErr.takeError(); nullptr);

bool HasObject;

Expected<std::vector<unsigned>> SymbolsOrErr =

getSymbols(Buf, 0, *Is64BitOrErr ? SymNames64 : SymNames32, nullptr,

HasObject);

if (!SymbolsOrErr) if (!SymbolsOrErr)

return SymbolsOrErr.takeError(); return SymbolsOrErr.takeError();

} }

uint64_t MemberTableEndOffset = uint64_t MemberTableEndOffset =

LastMemberEndOffset + LastMemberEndOffset +

alignTo(sizeof(object::BigArMemHdrType) + MemberTableSize, 2); alignTo(sizeof(object::BigArMemHdrType) + MemberTableSize, 2);

Show All 22 Lines if (!isAIXBigArchive(Kind)) {

// Fixed Sized Header. // Fixed Sized Header.

printWithSpacePadding(Out, NewMembers.size() ? LastMemberEndOffset : 0, printWithSpacePadding(Out, NewMembers.size() ? LastMemberEndOffset : 0,

20); // Offset to member table 20); // Offset to member table

// If there are no file members in the archive, there will be no global // If there are no file members in the archive, there will be no global

// symbol table. // symbol table.

printWithSpacePadding(Out, GlobalSymbolOffset, 20); printWithSpacePadding(Out, GlobalSymbolOffset, 20);

printWithSpacePadding(Out, GlobalSymbolOffset64, 20); printWithSpacePadding(Out, GlobalSymbolOffset64, 20);

printWithSpacePadding( printWithSpacePadding(Out,

Out, NewMembers.size() ? sizeof(object::BigArchive::FixLenHdr) : 0, NewMembers.size()

? sizeof(object::BigArchive::FixLenHdr) +

Data[0].PreHeadPadSize

: 0,

20); // Offset to first archive member 20); // Offset to first archive member

printWithSpacePadding(Out, NewMembers.size() ? LastMemberHeaderOffset : 0, printWithSpacePadding(Out, NewMembers.size() ? LastMemberHeaderOffset : 0,

20); // Offset to last archive member 20); // Offset to last archive member

printWithSpacePadding( printWithSpacePadding(

Out, 0, Out, 0,

20); // Offset to first member of free list - Not supported yet 20); // Offset to first member of free list - Not supported yet

for (const MemberData &M : Data) { for (const MemberData &M : Data) {

Out << std::string(M.PreHeadPadSize, '\0');

jhendersonUnsubmitted

Done

for (const MemberData &M : Data) {

- for (uint64_t i = 0; i < M.PreHeadPadSize; i++)

+ for (uint64_t i = 0; i < M.PreHeadPadSize; ++i)

Out << '\0';

It may be slightly more efficient to do something like:

OS << std::string(M.PreHeadPadSize, '\0');

It's certainly a little more elegant. Alternatively, you could use std::fill_n. There are good explanations of both these at https://stackoverflow.com/a/11421689.

jhenderson: It may be slightly more efficient to do something like: ``` OS << std::string(M.PreHeadPadSize…

DiggerLinAuthorUnsubmitted

Done

we write M.PreHeadPadSize of '\0' , using std::string(M.PreHeadPadSize, '\0') , the string is a empty string , the OS << std::string(M.PreHeadPadSize, '\0') do not output M.PreHeadPadSize of '\0'

DiggerLin: we write M.PreHeadPadSize of '\0' , using std::string(M.PreHeadPadSize, '\0') , the string is a…

jhendersonUnsubmitted

Done

Did you actually try my suggestion? Your statement is wrong. I've tried it with some simple code and it works fine - OS << std::string(10, '\0') writes 10 null bytes to the output.

std::string(M.PreHeadPadSize, '\0') constructs a std::string that contains M.PreHeadPadSize null bytes. It is NOT an empty string (although using .c_str() will make it look like it is).

jhenderson: Did you actually try my suggestion? Your statement is wrong. I've tried it with some simple…

DiggerLinAuthorUnsubmitted

Done

it work, thanks

DiggerLin: it work, thanks

Out << M.Header << M.Data; Out << M.Header << M.Data;

if (M.Data.size() % 2) if (M.Data.size() % 2)

Out << '\0'; Out << '\0';

} }

if (NewMembers.size()) { if (NewMembers.size()) {

// Member table. // Member table.

printBigArchiveMemberHeader(Out, "", sys::toTimePoint(0), 0, 0, 0, printBigArchiveMemberHeader(Out, "", sys::toTimePoint(0), 0, 0, 0,

▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

llvm/test/Object/archive-malformed-object.test

	## Show that the archive library emits error messages when adding malformed			## Show that the archive library emits error messages when adding malformed
	## objects.			## objects.

	# RUN: rm -rf %t.dir			# RUN: rm -rf %t.dir
	# RUN: split-file %s %t.dir			# RUN: split-file %s %t.dir
	# RUN: cd %t.dir			# RUN: cd %t.dir

	## Malformed bitcode object.			## Malformed bitcode object is the first file member of archive if the symbol table is required.
	# RUN: llvm-as input.ll -o input.bc			# RUN: llvm-as input.ll -o input.bc
				# RUN: cp input.bc good.bc
	# RUN: %python -c "with open('input.bc', 'a') as f: f.truncate(10)"			# RUN: %python -c "with open('input.bc', 'a') as f: f.truncate(10)"
	# RUN: not llvm-ar rc bad.a input.bc 2>&1 \| FileCheck %s --check-prefix=ERR1			# RUN: not llvm-ar rc bad.a input.bc 2>&1 \| FileCheck %s --check-prefix=ERR1

				## Malformed bitcode object is the last file member of archive if the symbol table is required.
				# RUN: rm -rf bad.a
				jhendersonUnsubmitted Done Reply Inline Actions I don't think this test should use an XCOFF object. XCOFF is not a well-known format, so using it purely to provide a "good" first object will make it look like it has been chosen very deliberately, which confuses the purpose of the test. Better would be to have another bitcode object that isn't malformed as the first object. You can then have `bad.bc` and `good.bc`, which more clearly indicates what is important. jhenderson: I don't think this test should use an XCOFF object. XCOFF is not a well-known format, so using…
				# RUN: not llvm-ar rc bad.a good.bc input.bc 2>&1 \| FileCheck %s --check-prefix=ERR1

				jhendersonUnsubmitted Done Reply Inline Actions I'm pretty sure you don't want the `S` here? Compare this command to the one above. If you are using a non-big archive format (which will be the default on many people's systems), then the symbols are only loaded if `S` is not present. If the symbols aren't loaded, the bitcode file won't result in an error (I think). jhenderson: I'm pretty sure you don't want the `S` here? Compare this command to the one above. If you are…
				DiggerLinAuthorUnsubmitted Done Reply Inline Actions I need to `S` to test your comment , the archive format always depend on the first object file of llvm-ar argument(no matter the OS). when the xcoff object file, the archive is big archive. `Ideally, you would also have a test case that shows that when no symbols are requested (i.e. llvm-ar S<more options> test.a xcoff.o), an error is also reported for an invalid member of a big archive.` DiggerLin: I need to `S` to test your comment , the archive format always depend on the first object…
				jhendersonUnsubmitted Not Done Reply Inline Actions Oh, okay I misremembered how the format selection works. Using `--format=bigarchive` is probably a good idea anyway for clarity. My expectation with the suggestion you've quoted was that `xcoff.o` would be the file that couldn't be read, rather than your test case mixing bitcode and xcoff files. I think mixing formats might be a little bit confusing, so I'd either have one/two xcoff files (so the same pair of cases as the invalid bitcode with symbol table cases above, but using xcoff files), or even just the same as the previous two cases, but with `S` (and the `--format=bigarchive` option), using the same bitcode files. I have no particular preference (but it probably is a good idea to include both "first" and "not first" cases for completeness). jhenderson: Oh, okay I misremembered how the format selection works. Using `--format=bigarchive` is…
				## Malformed bitcode object if the symbol table is not required for big archive.
				# RUN: rm -rf bad.a
				# RUN: not llvm-ar --format=bigarchive rcS bad.a input.bc 2>&1 \| FileCheck %s --check-prefix=ERR1
				# RUN: rm -rf bad.a
				# RUN: not llvm-ar --format=bigarchive rcS bad.a good.bc input.bc 2>&1 \| FileCheck %s --check-prefix=ERR1

	# ERR1: error: bad.a: 'input.bc': Invalid bitcode signature			# ERR1: error: bad.a: 'input.bc': Invalid bitcode signature

	## Non-bitcode malformed file.			## Non-bitcode malformed file.
	# RUN: yaml2obj input.yaml -o input.o			# RUN: yaml2obj input.yaml -o input.o
	# RUN: not llvm-ar rc bad.a input.o 2>&1 \| FileCheck %s --check-prefix=ERR2			# RUN: not llvm-ar rc bad.a input.o 2>&1 \| FileCheck %s --check-prefix=ERR2

	# ERR2: error: bad.a: 'input.o': section header table goes past the end of the file: e_shoff = 0x9999			# ERR2: error: bad.a: 'input.o': section header table goes past the end of the file: e_shoff = 0x9999

	## Don't emit an error if the symbol table is not required.			## Don't emit an error if the symbol table is not required for formats other than the big archive format.
	# RUN: llvm-ar rcS good.a input.o input.bc			# RUN: llvm-ar --format=gnu rcS good.a input.o input.bc
	# RUN: llvm-ar t good.a \| FileCheck %s --check-prefix=CONTENTS			# RUN: llvm-ar t good.a \| FileCheck %s --check-prefix=CONTENTS
				jhendersonUnsubmitted Not Done Reply Inline Actions Do we need to be explicit about gnu format? If the archive format is derived from the first member, this won't be big archvie, right? If we don't need to be explicit, please remove the `--format` option, so that it can capture more cases. If for whatever reason it would be bigarchive without the format option, it's fine to leave as-is. In either case, I suggest changing the comment to say "not required for formats other than the big archive format." as it's not specifically gnu format here that's important. jhenderson: Do we need to be explicit about gnu format? If the archive format is derived from the first…
				DiggerLinAuthorUnsubmitted Done Reply Inline Actions input.o is malformed object file, the archive format will depend on the getDefaultKindForHost(). So the --format=gnu is need here. DiggerLin: input.o is malformed object file, the archive format will depend on the getDefaultKindForHost().

	# CONTENTS: input.o			# CONTENTS: input.o
	# CONTENTS-NEXT: input.bc			# CONTENTS-NEXT: input.bc

	#--- input.ll			#--- input.ll
	target datalayout = "e-m:w-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:w-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-pc-linux"			target triple = "x86_64-pc-linux"

	#--- input.yaml			#--- input.yaml
	--- !ELF			--- !ELF
	FileHeader:			FileHeader:
	Class: ELFCLASS64			Class: ELFCLASS64
	Data: ELFDATA2LSB			Data: ELFDATA2LSB
	Type: ET_REL			Type: ET_REL
	EShOff: 0x9999			EShOff: 0x9999

llvm/test/tools/llvm-ar/big-archive-xcoff-align.test

This file was added.

## Test the alignment of XCOFF object files in the big archive format.

# RUN: rm -rf %t && mkdir %t

# RUN: cd %t

# RUN: yaml2obj --docnum=1 -DFLAG=0x1DF -DSECNAME=.data %s -o t32_1.o

jhendersonUnsubmitted

Done

## header, is aligned in a big archive based on the MAX(maximum alignment

- ## of .text, maximum alignment of .data)

+ ## of .text, maximum alignment of .data).

# RUN: rm -rf %t && mkdir %t

jhenderson:

# RUN: yaml2obj --docnum=1 -DFLAG=0x1F7 -DSECNAME=.data %s -o t64_1.o

jhendersonUnsubmitted

Done

Delete this blank line - the comment above is associated with the following test case, but the blank line suggests it either doesn't, or that it applies to the whole test file.

jhenderson: Delete this blank line - the comment above is associated with the following test case, but the…

# RUN: yaml2obj --docnum=1 -DFLAG=0x1DF -DSECNAME=.text %s -o t32_2.o

# RUN: yaml2obj --docnum=1 -DFLAG=0x1F7 -DSECNAME=.text %s -o t64_2.o

jhendersonUnsubmitted

Done

Don't rely on inputs in another part of the test tree. It is quite possible that these files will move/be deleted etc, and people won't expect that to impact tests in another part of the testing tree. Instead, you should create tehse files using yaml2obj.

The other aspect of this is that I have no way of telling that the inputs you're using actually have the properties that you are trying to test for (without inspecting the binaries). Having the yaml2obj form of them would enable this.

jhenderson: Don't rely on inputs in another part of the test tree. It is quite possible that these files…

# RUN: yaml2obj --docnum=2 -DFLAG=0x1DF %s -o t32_nomaxdata_text.o

# RUN: yaml2obj --docnum=2 -DFLAG=0x1F7 %s -o t64_nomaxdata_text.o

# RUN: yaml2obj --docnum=3 -DFLAG=0x1DF %s -o t32_maxdata_text.o

# RUN: yaml2obj --docnum=3 -DFLAG=0x1F7 %s -o t64_maxdata_text.o

jhendersonUnsubmitted

Done

Nit: prefer --check-prefix over -check-prefix in new tests. Applies throughout.

jhenderson: Nit: prefer `--check-prefix` over `-check-prefix` in new tests. Applies throughout.

# RUN: yaml2obj --docnum=4 -DFLAG=0x1DF %s -o t32_noloader.o

# RUN: yaml2obj --docnum=4 -DFLAG=0x1F7 %s -o t64_noloader.o

# RUN: yaml2obj --docnum=5 -DFLAG=0x1DF %s -o t32_excess.o

# RUN: yaml2obj --docnum=5 -DFLAG=0x1F7 %s -o t64_excess.o

# RUN: echo -e "import sys\nf=open(sys.argv[1],\"rb\");f.seek(int(sys.argv[2]));print(f.read(2));f.close()" > print_magic.py

## Test that the content of an XCOFF object file, which has an auxiliary header,

## is aligned in a big archive based on the content of auxiliary header.

# RUN: env OBJECT_MODE=32_64 llvm-ar -q t_aux.a t32_nomaxdata_text.o t64_nomaxdata_text.o t32_maxdata_text.o t64_maxdata_text.o t32_noloader.o t64_noloader.o t32_excess.o t64_excess.o

jhendersonUnsubmitted

Done

This blank line to me means the comment above is not associated with the checks that follow below. Delete it.

jhenderson: This blank line to me means the comment above is not associated with the checks that follow…

## The content of t32_nomaxdata_text, t64_nomaxdata_text.o aligned at 2.

jhendersonUnsubmitted

Done

Rather than repeating this python snippet over and over again, could you write a little python script at the end of this file, use split-file to split it into a .py file at runtime, and then execute the file with all the different input arguments?

The RUN line would end up something like:

# RUN: %python print-magic.py 262 | FileCheck --check-prefix=MAGIC32

jhenderson: Rather than repeating this python snippet over and over again, could you write a little python…

# RUN: %python print_magic.py t_aux.a 262 | FileCheck --check-prefix=MAGIC32 %s

# RUN: %python print_magic.py t_aux.a 528 | FileCheck --check-prefix=MAGIC64 %s

## The content of t32_maxdata_text.o, t64_maxdata_text.o aligned at 2^8.

# RUN: %python print_magic.py t_aux.a 1024 | FileCheck --check-prefix=MAGIC32 %s

# RUN: %python print_magic.py t_aux.a 1536 | FileCheck --check-prefix=MAGIC64 %s

## The content of t32_noloader.o, t64_noloader.o aligned at 2.

# RUN: %python print_magic.py t_aux.a 1870 | FileCheck --check-prefix=MAGIC32 %s

# RUN: %python print_magic.py t_aux.a 2130 | FileCheck --check-prefix=MAGIC64 %s

## The content of t32_excess.o aligned at word.

# RUN: %python print_magic.py t_aux.a 2464 | FileCheck --check-prefix=MAGIC32 %s

## The content of t64_excess.o aligned at 2^12.

# RUN: %python print_magic.py t_aux.a 4096 | FileCheck --check-prefix=MAGIC64 %s

## Test that the content of an XCOFF object file, which does not have an auxiliary

jhendersonUnsubmitted

Done

# RUN: %python -c 'f=open("t_aux.a","rb");f.seek(4096);print(f.read(2));f.close()' | FileCheck --check-prefix=MAGIC64 %s

- ## Test that the content of an XCOFF object files, which have not an auxiliary

+ ## Test that the content of an XCOFF object file, which does not have an auxiliary

## header, is aligned at 2 in a big archive.

# RUN: env OBJECT_MODE=32_64 llvm-ar -q t3.a t32_1.o t64_1.o t32_2.o t64_2.o

Or:

"Test that the content of XCOFF object files, which don't have auxiliary headers, are aligned at 2 in a big archive."

jhenderson: Or: "Test that the content of XCOFF object files, which don't have auxiliary headers, are…

## header, is aligned at 2 in a big archive.

jhendersonUnsubmitted

Not Done

# RUN: %python print_magic.py t_aux.a 4096 | FileCheck --check-prefix=MAGIC64 %s

- ## Test that the content of an XCOFF object files, which does not have an auxiliary

+ ## Test that the content of an XCOFF object file, which does not have an auxiliary

## header, is aligned at 2 in a big archive.

Nit, missed earlier.

jhenderson: Nit, missed earlier.

# RUN: env OBJECT_MODE=32_64 llvm-ar -q t3.a t32_1.o t64_1.o t32_2.o t64_2.o

# # RUN: %python print_magic.py t3.a 250 | FileCheck --check-prefix=MAGIC32 %s

# # RUN: %python print_magic.py t3.a 432 | FileCheck -check-prefix=MAGIC64 %s

# # RUN: %python print_magic.py t3.a 650 | FileCheck --check-prefix=MAGIC32 %s

# # RUN: %python print_magic.py t3.a 832 | FileCheck -check-prefix=MAGIC64 %s

# MAGIC64: b'\x01\xf7'

# MAGIC32: b'\x01\xdf'

--- !XCOFF

FileHeader:

MagicNumber: [[FLAG]]

Sections:

- Name: [[SECNAME]]

Flags: [ STYP_DATA ]

jhendersonUnsubmitted

Done

This is a near-duplicate of the previous YAML doc. Could you just parameterise the section name, like you do with the FLAG input parameter?

jhenderson: This is a near-duplicate of the previous YAML doc. Could you just parameterise the section name…

## The auxiliary header has neither the MaxAlignOfData nor MaxAlignOfText field.

--- !XCOFF

jhendersonUnsubmitted

Done

It looks like you missed that it should be "has neither" not "have neither"

jhenderson: It looks like you missed that it should be "has neither" not "have neither"

FileHeader:

MagicNumber: [[FLAG]]

AuxiliaryHeaderSize: 12

AuxiliaryHeader:

Magic: 0x10B

jhendersonUnsubmitted

Done

Flags: [ STYP_DATA ]

- ## The auxiliary header does not have both MaxAlignOfData and MaxAlignOfText field

+ ## The auxiliary header has neither the MaxAlignOfData nor MaxAlignOfText fields.

--- !XCOFF

jhenderson:

SecNumOfLoader: 1

Sections:

- Name: .text

Flags: [ STYP_DATA ]

## The auxiliary header has both MaxAlignOfData and MaxAlignOfText fields.

--- !XCOFF

jhendersonUnsubmitted

Not Done

Flags: [ STYP_DATA ]

- ## The auxiliary header has both MaxAlignOfData and MaxAlignOfText field.

+ ## The auxiliary header has both MaxAlignOfData and MaxAlignOfText fields.

--- !XCOFF

jhenderson:

FileHeader:

MagicNumber: [[FLAG]]

AuxiliaryHeaderSize: 48

AuxiliaryHeader:

Magic: 0x10B

jhendersonUnsubmitted

Done

Flags: [ STYP_DATA ]

- ## The auxiliary header have both MaxAlignOfData and MaxAlignOfText field.

+ ## The auxiliary header has both MaxAlignOfData and MaxAlignOfText fields.

--- !XCOFF

jhenderson:

SecNumOfLoader: 1

MaxAlignOfText: 6

MaxAlignOfData: 8

Sections:

- Name: .text

Flags: [ STYP_DATA ]

## The auxiliary header does not have a loader section.

--- !XCOFF

FileHeader:

MagicNumber: [[FLAG]]

AuxiliaryHeaderSize: 34

AuxiliaryHeader:

Magic: 0x10B

SecNumOfLoader: 0

MaxAlignOfText: 14

MaxAlignOfData: 8

Sections:

- Name: .text

Flags: [ STYP_DATA ]

## The auxiliary header has both MaxAlignOfData and MaxAlignOfText fields

## but max(MaxAlignOfData, MaxAlignOfText) exceeds the page size(2^12).

--- !XCOFF

jhendersonUnsubmitted

Done

## The auxiliary header has both MaxAlignOfData and MaxAlignOfText fields

- ## but max(MaxAlignOfData,MaxAlignOfText) excess the page size(2^12).

+ ## but max(MaxAlignOfData, MaxAlignOfText) exceeds the page size(2^12).

--- !XCOFF

jhenderson:

FileHeader:

MagicNumber: [[FLAG]]

AuxiliaryHeaderSize: 48

AuxiliaryHeader:

jhendersonUnsubmitted

Done

Flags: [ STYP_DATA ]

- ## The auxiliary header have both MaxAlignOfData and MaxAlignOfText field but excess the page size.

+ ## The auxiliary header has both MaxAlignOfData and MaxAlignOfText fields but excess the page size.

--- !XCOFF

I'm not sure what "but excess the page size" means. Should it be "but they exceed the page size" or something?

jhenderson: I'm not sure what "but excess the page size" means. Should it be "but they exceed the page…

Magic: 0x10B

SecNumOfLoader: 1

MaxAlignOfText: 14

MaxAlignOfData: 8

Sections:

- Name: .text

Flags: [ STYP_DATA ]

This is an archive of the discontinued LLVM Phabricator instance.

[AIX] Align the content of an xcoff object file which has auxiliary header in big archive. ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 556046

llvm/lib/Object/ArchiveWriter.cpp

llvm/test/Object/archive-malformed-object.test

llvm/test/tools/llvm-ar/big-archive-xcoff-align.test

[AIX] Align the content of an xcoff object file which has auxiliary header in big archive.
ClosedPublic