This is an archive of the discontinued LLVM Phabricator instance.

[AIX] Align the content of an xcoff object file which has auxiliary header in big archive.
ClosedPublic

Authored by DiggerLin on Feb 27 2023, 6:51 AM.

Details

Summary

if the member file is XCOFF object file and has auxiliary header, the content of the member file need to be aligned at the
MAX(maximum alignment of .text , maximum alignment of .data). The "maximum alignment of .text" and "maximum alignment of .data" are two
field of auxiliary header of XCOFF object file.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Herald added a project: Restricted Project. · View Herald TranscriptFeb 27 2023, 6:51 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
DiggerLin requested review of this revision.Feb 27 2023, 6:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 27 2023, 6:51 AM

I've started reviewing, but have run out of time for this for now. One high-level question: regular Unix archive member alignment is done at the end of a member (I believe), rather than the start of the next, meaning that the final member will have tail padding, if needed. I take it that this isn't the case here, since the alignment is purely to ensure aligned data in the object?

llvm/lib/Object/ArchiveWriter.cpp
564–566

This should be referring to the Big Archive format, right, not the OS?

Other suggestions in the inline edit.

567

Two nits, and one more significant point.

  1. No need for const & for StringRef, which is intended to be copied.
  2. ObjStringRef -> Obj.
  3. This function appears to be reading in the file and parsing it just to get the alignments. However, presumably this isn't the only place where we have the fully parsed object, since at some point you have to know what to write in the object file in the first place, right? Wouldn't it make more sense to identify and record this alignment then?
574

Seems like this should report the error, not just ignore it...

590

No need for else after return.

stephenpeckham added inline comments.Mar 2 2023, 7:13 AM
llvm/lib/Object/ArchiveWriter.cpp
564–566

The alignment is not a requirement of the Big Archive format. It's required by the OS for 64-bit members and recommended for 32-bit members. AIX allows shared objects to be archive members. When archive members are loaded, they are mapped into memory. If the members aren't aligned properly in the archive, they won't be aligned in memory. Both .text and .data are mapped, so the required member alignment takes into account both the .text and .data alignment. Alignment is not necessary for members that are not loadable.

583

Only loadable objects need to be aligned. Onc requirement for a loadable module is the presence of a loader section. The o_snloader field in the auxiliary header can be checked.

882

It's possible to have a loadable object with a very large text or data alignment. A sanity check would be useful here. The AIX ar command caps the alignment at 2^12 (the typical PAGESIZE on an AIX system).

DiggerLin updated this revision to Diff 503149.Mar 7 2023, 2:02 PM
DiggerLin marked 4 inline comments as done.
DiggerLin added inline comments.
llvm/lib/Object/ArchiveWriter.cpp
567

yes

The alignment calculations look good now, including handling special cases.

jhenderson added inline comments.Mar 16 2023, 2:23 AM
llvm/lib/Object/ArchiveWriter.cpp
335

No need for explicit assignment of nullptr - unique_ptr's default constructor leaves it in an empty state.

544–562

At the moment, I'm struggling to follow parts of this comment so I'd like to propose rewording it as follows:

"AIX Big Archives may contain shared object members. The AIX OS requires these members to be aligned if they are 64-bit and recommends it for 32-bit members. This ensures that when these members are loaded they are aligned in memory."

I think the rest of the comment can be moved into the method body, and I'll comment as appropriate.

555

Perhaps you could flip this on its head. Something like:

XCOFFObjectFile *XCOFFObj = dyn_cast_or_null<XCOFFObjectFile>(SymObj);
if (!XCOFFObj)
  // Replace this comment with a comment that says why "2" is the right value.
  return 2;
...
556

I think I'd make this a free-standing function. The body is long and it doesn't capture any variables, so I don't think making it a lambda is particularly helpful for readability.

I'd also rename Size to be clearer what size it represents (e.g. "AuxHeaderSize" if that is correct).

558

Add a comment like: "If the member doesn't have an auxiliary header, it isn't a loadable object and so doesn't need aligning."

561–562

It's not clear to me why if it is missing one or other of these that 2 is the right choice. Why is it not the other alignment value?

563–564

Please use static_cast or reinterpret_cast, not C-style casts. That being said, I'm struggling to follow the logic here. Is this essentially testing if the Header is too small to contain the MaxAlignOfData field? If so, is that actually a permitted case? The + 2 in particular is throwing me off though.

567
571–573
575

This value of 12 is a magic number that is rather meaningless to a reader of the code. Please stick it in a named constant somewhere.

Also, is AlignSize (and MaxAlignSize) really an appropriate name for the variable? An alignment value isn't a size, so unless you are aligning a size field or something, it doesn't really make sense as a name.

This line also needs a comment explaining what it is doing and why.

579
748

Why has this comment changed only to ADD a typo??

750–751

Why has this variable been renamed?

794

I'm not sure I understand the changes to this loop. Why do you need to know anything about the next member to know how much padding the current member requires?

DiggerLin updated this revision to Diff 510127.Mar 31 2023, 1:56 PM
DiggerLin marked 14 inline comments as done.
DiggerLin added inline comments.Mar 31 2023, 1:57 PM
llvm/lib/Object/ArchiveWriter.cpp
335

it need it , otherwise there is compiler error on
line 327
"return {{}, std::move(Header), Names, Pad ? "\n" : ""};"

as "missing field 'SymFile' initializer [-Werror,-Wmissing-field-initializers]"

750–751

I think "Pre" is more concise for the abbreviation of previous than "Prev" , I search from internet , the "Prev" is better , I change back.

794

there is field "char ar_nxtmem[20]" /* Next member offset-decimal */"
in the "File Member Header" in big archive
, we need to calculate the padding before the header if there is.

882

the pos points to the begin of the header of member file, it not be aligned, only the data of the member file is aligned.

I think I now understand why you've changed the NewMember loop stuff at least, but I wanted to raise one thing before I make any further comments: the spec https://www.ibm.com/docs/en/aix/7.2?topic=formats-ar-file-format-big makes no comment about all this complicated additional alignment stuff, and refers simply to aligning on even byte boundaries. From a traditional GNU-like archive format, this makes sense: the ar tool is designed for creating static archvies that are linked by the static linker. Loadability is not a thing that needs thinking about.

Are you suggesting that AIX Big Archives can be used to store shared objects in an archive, and then that the runtime loader can directly load shared objects from these archives?

llvm/lib/Object/ArchiveWriter.cpp
335

Ah, I didn't see you used it like that. In which case, this is fine.

506

This and the next constant are only applicable to Big Archives/AIX, but that isn't communicated by the variable name. Could this one be renamed Log2OfAIXPageSize?

508
512

Similar to above, perhaps MinBigArchiveMemDataAlign?

515

Function names start with lower-case letters...

518

Don't specifically say "even" here, because the variable name is talking about "minimum".

523

This comment doesn't explain the "why" of what you're doing. I think it's important that you explain this for this case.

This was the point I was trying to get across with my previous comment, related to the casting style. Why is this the right thing to do here (in particular why the "2")?

534–535
567

Any reason you didn't adopt this comment suggestion I made?

(You can change "so align at 2." in my suggestion to "so align at the minimum value."

the spec https://www.ibm.com/docs/en/aix/7.2?topic=formats-ar-file-format-big makes no comment about all this complicated additional alignment stuff, and refers simply to aligning on even byte boundaries. From a traditional GNU-like archive format, this makes sense: the ar tool is designed for creating static archvies that are linked by the static linker. Loadability is not a thing that needs thinking about.

Are you suggesting that AIX Big Archives can be used to store shared objects in an archive, and then that the runtime loader can directly load shared objects from these archives?

Yes, the AIX loader loads shared objects directly from archives. This simplifies the search for shared objects, because both 32- and 64-bit shared objects can be in the same archive. In addition, an archive can contain multiple shared objects. It is more efficient for the system loader to load archive members if the members are aligned in the archive. In fact, the system loader will refuse to load 64-bit members that are not aligned properly.

It's true that the documentation for the archive file format does not mention the alignment requirement. I don't think the writers of the documentation anticipated that a third party would be writing its own archive management tools.

DiggerLin updated this revision to Diff 510500.Apr 3 2023, 7:55 AM
DiggerLin marked 6 inline comments as done.

thanks for comments.

jhenderson added inline comments.Apr 14 2023, 1:00 AM
llvm/lib/Object/ArchiveWriter.cpp
523
527

I still don't understand specifically why "2". Should this use some form of offsetof to get the actual position in the struct or similar?

720

We've had this sort of discussion before: you can't use static local variables like this that will change each time this function is called, because this function could be called from multiple threads.

793

Please re-review the LLVM coding standards. You keep making mistakes, despite being told before, that are covered in the coding standards.

851–864

Some more comments explaining why you're doing what you're doing in this block would be good.

882–883

Is this value correct for the last member in an archive?

890–891

If I'm not mistaken, the implication of this code is that you will no longer be able to store non-symbolic-files in BigArchives. Is that intended (and if so, is it correct)? For other formats, you can store non-symbolic files as archive members. They just don't contribute anything to the symbol table.

890–894

If I'm not mistaken, you could avoid a lot of this duplicate logic about the SymbolicFile by pulling it out of the isAIXBigArchive code. Something like this outline:

for (auto M = NewMembers.begin(); M < NewMembers.end(); ++N) {
  if (NeedSymbols || isAIXBigArchive(Kind)) {
    if (M == NewMembers.begin() {
      CurSymFile = // load symbolic file for first member
    } else {
      CurSymFile = std::move(NextSymFile);
    }
    if (M + 1 != NewMembers.end()) {
      NextSymFile = // load symbolic file for next member
    }

    // Do all of loop logic
  }
}

You'll need to handle non-symbolic files somehow, perhaps by just resetting the relevant std::unique_ptr to leave it empty, and then checking whether the pointer is empty before trying to read it.

DiggerLin updated this revision to Diff 514982.Apr 19 2023, 9:09 AM
DiggerLin marked 7 inline comments as done.

address comment, thanks for James's comment .

DiggerLin added inline comments.Apr 19 2023, 9:10 AM
llvm/lib/Object/ArchiveWriter.cpp
527

yes.

720

thanks for let me know.

793

sorry for careless, actually I know that ++M is required by LLVM coding standard.

882–883

yes, the value is correct. the NextOffset of last file member will be point to "Member Table" . and there is not a special requirement of content of "Member Table"

890–891

in the function

static Expected<std::unique_ptr<SymbolicFile>>
getSymbolFile(MemoryBufferRef Buf, LLVMContext &Context) {
  std::unique_ptr<object::SymbolicFile> Obj;
  const file_magic Type = identify_magic(Buf.getBuffer());
  // Treat non symbolic file types as nullptr.
  if (!object::SymbolicFile::isSymbolicFile(Type, &Context))
    return nullptr;

it treats non symbolic file types as nullptr. it do not return a error.

890–894

in the if (M == NewMembers.begin()) , it not only open CurSymFile but also calculate MemHeadPadSize

in the if ((M + 1) != NewMembers.end()), it not only open CurSymFile but also calculate NextMemHeadPadSize but also calculate the NextMemHeadPadSize

even if I do as your suggestion, I still need to if (M == NewMembers.begin()) and if ((M + 1) != NewMembers.end()) for MemHeadPadSize and NextMemHeadPadSize later

You'll need to handle non-symbolic files somehow, perhaps by just resetting the relevant std::unique_ptr to leave it empty, and then checking whether the pointer is empty before trying to read it.

function getSymbolFile already return nullptr for non-symbolic files.
stephenpeckham accepted this revision.May 15 2023, 7:30 AM
This revision is now accepted and ready to land.May 15 2023, 7:30 AM
jhenderson added inline comments.Jun 12 2023, 12:18 AM
llvm/lib/Object/ArchiveWriter.cpp
824–825

(and make sure the comment respects the column limit properly - I haven't attempted to with my edit)

890–894

My complaint is that structurally, this and the above block are largely the same. Yes, one does a bit more than the other, but ultimately, the two are structurally identical.

You could refactor the code to avoid this structural duplication, without too much difficulty, using my suggested code above. To do the extra work regarding calculate the padding sizes, simply add "if (isAIXBigArchive(Kind)) clauses:

for (auto M = NewMembers.begin(); M < NewMembers.end(); ++M) {
  if (NeedSymbols || isAIXBigArchive(Kind)) {
    if (M == NewMembers.begin() {
      CurSymFile = // load symbolic file for first member
      if (isAIXBigArchive(Kind)) {
        // Do stuff with padding, as needed.
      }
    } else {
      CurSymFile = std::move(NextSymFile);
    }
    if (M + 1 != NewMembers.end()) {
      NextSymFile = // load symbolic file for next member
      if (isAIXBigArchive(Kind)) {
        // Do stuff with padding, as needed.
      }
    }

    // Do all of loop logic (or do some before the ifs and some after).
  }
}
DiggerLin updated this revision to Diff 543996.Jul 25 2023, 8:42 AM
DiggerLin marked 3 inline comments as done.
DiggerLin added inline comments.
llvm/lib/Object/ArchiveWriter.cpp
890–894

if I change as your suggestion,
it only reduce the code of

if (!isAIXBigArchive(Kind)) {
       Expected<std::unique_ptr<SymbolicFile>> SymFileOrErr =
           getSymbolFile(Buf, Context);
       if (!SymFileOrErr)
         return createFileError(M->MemberName, SymFileOrErr.takeError());
       CurSymFile = std::move(*SymFileOrErr);
     }

I still need to put variables NextOffset and OffsetToMemData under the if (NeedSymbols || isAIXBigArchive(Kind))

and it will several

if (isAIXBigArchive(Kind)) under if (NeedSymbols || isAIXBigArchive(Kind)) {

we will have duplication code of
printMemberHeader(Out, Pos, StringTable, MemberNames, Kind, Thin, *M, ModTime, Size);

 if (NeedSymbols || isAIXBigArchive(Kind)) {
   ...
  if(!isAIXBigArchive(Kind)) 
    printMemberHeader(Out, Pos, StringTable, MemberNames, Kind, Thin, *M, ModTime, Size);
}
 else {
     printMemberHeader(Out, Pos, StringTable, MemberNames, Kind, Thin, *M, ModTime, Size);
  }

I do not think the logic of your suggestion is clearer than the current code. If you strong suggest , I can change as your suggestion.

DiggerLin updated this revision to Diff 545395.Jul 29 2023, 7:16 PM

rebase code

jhenderson added inline comments.Jul 31 2023, 2:36 AM
llvm/lib/Object/ArchiveWriter.cpp
866–867
890–894

Either you've misunderstood my suggestion or I've missed something. I'm pretty sure I haven't missed anything, as I've just experimented with reorganising the code locally, and came up with a perfectly reasonable looking solution without even referring back to my old one, yet after looking back at it, it looks fairly similar.

The changes I made were as follows:

for (...) {
  ... after the existing file size limit check ...

  if (NeedSymbols || isAIXBigArchive(Kind)) {
    auto SetNextSymFile = [&NextSymFile,
                           &Context](MemoryBufferRef Buf,
                                     StringRef MemberName) -> Error {
      Expected<std::unique_ptr<SymbolicFile>> SymFileOrErr =
          getSymbolicFile(Buf, Context);
      if (!SymFileOrErr) {
        return createFileError(MemberName, SymFileOrErr.takeError());
      }
      NextSymFile = std::move(*SymFileOrErr);
      return Error::success();
    };

    if (M == NewMembers.begin())
      if (Error Err = SetNextSymFile(Buf, M->MemberName))
        return std::move(Err);
    CurSymFile = std::move(NextSymFile);

    if (M + 1 != NewMembers.end())
      if (Error Err = SetNextSymFile((M + 1)->Buf->getMemBufferRef(), (M + 1)->MemberName))
        return std::move(Err);
  }

  if (isAIXBigArchive(Kind)) {
    uint64_t OffsetToMemData = Pos + sizeof(object::BigArMemHdrType) +
                               alignTo(M->MemberName.size(), 2);

    if (M == NewMembers.begin()) {
      MemHeadPadSize = alignToPowerOf2(OffsetToMemData,
                                       getMemberAlignment(CurSymFile.get())) -
                       OffsetToMemData;
    } else {
      MemHeadPadSize = NextMemHeadPadSize;
    }

    ... update Pos, calculate NextOffset etc as before, then write header ...
  } else {
    printMemberHeader(...);
  }
  Out.flush();

  std::vector<unsigned> Symbols;
  if (NeedSymbols) {
    Expected<std::vector<unsigned>> SymbolsOrErr =
        getSymbols(CurSymFile.get(), Index, SymNames, SymMap);
    if (!SymbolsOrErr)
      return createFileError(M->MemberName, SymbolsOrErr.takeError());
    Symbols = std::move(*SymbolsOrErr);
    if (CurSymFile) // Can we remove this if check and set HasObject to true where CurSymFile is set?
      HasObject = true;
  }

  ... update Pos and Ret ...
}

I think this logic flows quite nicely, and certainly better than the current suggest. It avoids multiple different call sites to getSymbolicFile, for the cost of a few simple ifs. It also avoids any significant nested ifs (i.e. ones where the inner if is a large block), which helps readability of the code.

Aside: is the check on CurSymFile for the HasObject setting necessary? This potentially could be simplified, but I haven't got the time to follow this logic.

DiggerLin marked 2 inline comments as done.

address comment

llvm/lib/Object/ArchiveWriter.cpp
890–894

thanks , I follow your suggestion.

Just as a heads-up, I'm off for the rest of the week. Hopefully I'll be able to get to any final points either before the end of my workday or later next week.

llvm/lib/Object/ArchiveWriter.cpp
894–895

Did you see my aside about HasObject at the end of my long inline comment? Can the logic around HasObject be simplified in the latest version. I've not followed it enough to be confident either way.

DiggerLin marked an inline comment as done.Aug 1 2023, 6:44 AM
DiggerLin added inline comments.
llvm/lib/Object/ArchiveWriter.cpp
894–895

Can the logic around HasObject be simplified in the latest version.

I do not think HasObject be simplified, in the loop

   for (auto M = NewMembers.begin(); M < NewMembers.end(); ++M) {  
.....
    if (CurSymFile)
       HasObject = true;
 ....
}
.....
   if (HasObject && SymNames.tell() == 0 && !isCOFFArchive(Kind))
   SymNames << '\0' << '\0' << '\0';
 return Ret;

if any of member is symbolicFile and SymName is empty and !isCOFFArchive(Kind)) , it will

SymNames << '\0' << '\0' << '\0';

I do not think we can simplify it.

DiggerLin marked an inline comment as done.Aug 1 2023, 6:45 AM

Just as a heads-up, I'm off for the rest of the week. Hopefully I'll be able to get to any final points either before the end of my workday or later next week.

Enjoy your vacation.

I'll be honest, I'm not convinced the test coverage looks to be anywhere near comprehensive enough. However, I also haven't attempted to review it versus the code changes to check how much of the new code is actually covered. I'd suggest you consider using a code coverage tool to see how much coverage there is of your new code.

llvm/lib/Object/ArchiveWriter.cpp
508–511

Could you reflow this comment, please? It looks like some of the word wrapping is happening earlier than needed.

A quick look at many of the other new comments in this file makes it look like they have the same issue. Please review all your comments and reflow as necessary.

544–561

I can't remember whether we decided it should be "Big Archive" or "big archive". Either way, you should be consistent in all your comments and use one or the other in every case, rather than a mixture (I think "big archive" is the norm elsewhere in this patch).

681

Could you please move this back to where it was (immediately before the if (SymMap). There's no reason to move it, and by moving it you've left it further from its point of use than it needs to be.

830–832

No need for the braces here.

1173

It may be slightly more efficient to do something like:

OS << std::string(M.PreHeadPadSize, '\0');

It's certainly a little more elegant. Alternatively, you could use std::fill_n. There are good explanations of both these at https://stackoverflow.com/a/11421689.

DiggerLin added a comment.EditedAug 14 2023, 7:08 AM

I'll be honest, I'm not convinced the test coverage looks to be anywhere near comprehensive enough. However, I also haven't attempted to review it versus the code changes to check how much of the new code is actually covered. I'd suggest you consider using a code coverage tool to see how much coverage there is of your new code.

In the test big-archive-xcoff-auxi-head-align.test , it test whether the XCOFF object data member which has auxiliary header  align correctly  based on the information of  auxiliary header. it is enough here.  In AIX OS since build bot use  CMake 3.22, it use llvm-ar instead of  AIX OS ar.  it will test the functionality too. (we have to add the -DCMAKE_AR=/usr/bin/ar when we compile, since llvm-ar do not align XCOFF object correctly for big archive in AIX os ). after the the patch commit, we will use the `llvm-ar` instead of  the AIX `ar`
DiggerLin updated this revision to Diff 549994.Aug 14 2023, 9:36 AM
DiggerLin marked 3 inline comments as done.
DiggerLin added inline comments.Aug 14 2023, 9:40 AM
llvm/lib/Object/ArchiveWriter.cpp
1173

we write M.PreHeadPadSize of '\0' , using std::string(M.PreHeadPadSize, '\0') , the string is a empty string , the OS << std::string(M.PreHeadPadSize, '\0') do not output M.PreHeadPadSize of '\0'

DiggerLin updated this revision to Diff 550070.Aug 14 2023, 1:55 PM

rebase the patch

jhenderson added inline comments.Aug 15 2023, 1:57 AM
llvm/lib/Object/ArchiveWriter.cpp
1173

Did you actually try my suggestion? Your statement is wrong. I've tried it with some simple code and it works fine - OS << std::string(10, '\0') writes 10 null bytes to the output.

std::string(M.PreHeadPadSize, '\0') constructs a std::string that contains M.PreHeadPadSize null bytes. It is NOT an empty string (although using .c_str() will make it look like it is).

I'll be honest, I'm not convinced the test coverage looks to be anywhere near comprehensive enough. However, I also haven't attempted to review it versus the code changes to check how much of the new code is actually covered. I'd suggest you consider using a code coverage tool to see how much coverage there is of your new code.

In the test big-archive-xcoff-auxi-head-align.test , it test whether the XCOFF object data member which has auxiliary header align correctly based on the information of auxiliary header. it is enough here. In AIX OS since build bot use CMake 3.22, it use llvm-ar instead of AIX OS ar. it will test the functionality too. (we have to add the -DCMAKE_AR=/usr/bin/ar when we compile, since llvm-ar do not align XCOFF object correctly for big archive in AIX os ). after the the patch commit, we will use the llvm-ar instead of the AIX ar

For your code to land in LLVM, it needs to have appropriate levels of testing in LLVM (i.e. lit tests and gtest unit tests). Otherwise, an LLVM developer could easily make a change that breaks the existing behaviour in a subtle way. Relying on it being a system archiver on some cases will not provide the level of coverage you need IN LLVM. Bugs will creep in and will impact your system users, which is of no benefit to anyone.

DiggerLin updated this revision to Diff 550342.Aug 15 2023, 8:25 AM
DiggerLin marked 2 inline comments as done.

added new test scenarios .

DiggerLin marked 2 inline comments as done.Aug 16 2023, 6:42 AM
DiggerLin added inline comments.
llvm/lib/Object/ArchiveWriter.cpp
1173

it work, thanks

jhenderson requested changes to this revision.Aug 23 2023, 12:23 AM

I've added a few more nits. I've also gone through and highlighted a number of cases that I'm pretty confident don't have existing test coverage. I shouldn't have had to go through these bits myself and highlight all of this - you should have done this yourself, prior to the patch even going up for review. Requesting changes to clear the "Accepted" state.

llvm/lib/Object/ArchiveWriter.cpp
526–527

This doesn't seem to be covered?

531

SecNumOfLoader is unsigned, right? So it can never be less than 0 by definition...

531–532

This probably isn't covered?

540

This needs test cases for the text alignment being bigger and the data alignment being bigger. I can't tell from the existing tests whether both cases are covered.

541

Are both parts of this ternary covered?

550

Nit: delete this blank line - the if below is strongly linked to the previous line.

555
838

Test case needed.

845

Test case needed.

llvm/test/tools/llvm-ar/big-archive-xcoff-align.test
5 ↗(On Diff #550342)
6 ↗(On Diff #550342)

Delete this blank line - the comment above is associated with the following test case, but the blank line suggests it either doesn't, or that it applies to the whole test file.

8 ↗(On Diff #550342)

Don't rely on inputs in another part of the test tree. It is quite possible that these files will move/be deleted etc, and people won't expect that to impact tests in another part of the testing tree. Instead, you should create tehse files using yaml2obj.

The other aspect of this is that I have no way of telling that the inputs you're using actually have the properties that you are trying to test for (without inspecting the binaries). Having the yaml2obj form of them would enable this.

12 ↗(On Diff #550342)

Nit: prefer --check-prefix over -check-prefix in new tests. Applies throughout.

22 ↗(On Diff #550342)

This blank line to me means the comment above is not associated with the checks that follow below. Delete it.

This revision now requires changes to proceed.Aug 23 2023, 12:23 AM
DiggerLin updated this revision to Diff 553652.Aug 25 2023, 3:37 PM
DiggerLin marked 10 inline comments as done.

add new test scenario

llvm/lib/Object/ArchiveWriter.cpp
526–527

Are we need a test case to cover all the statement ?

838

we just refactor the code here, not adding a new functionality here, I do not think we need a new test case here.

845

some reason as above.

DiggerLin updated this revision to Diff 553655.Aug 25 2023, 3:45 PM
DiggerLin added inline comments.
llvm/lib/Object/ArchiveWriter.cpp
526–527

I added a test scenario for it anyway.

jhenderson added inline comments.Aug 29 2023, 2:34 AM
llvm/lib/Object/ArchiveWriter.cpp
526–527

Yes, all statements should be covered by tests. Otherwise, you've got nothing that will show that this code is functioning correctly.

838

Although I agree that the code to load a symbolic file is just a refactoring of existing code, this use of that code is a new call site, where an error needs checking in a test. If there's an existing test that hits this specific piece of code, then that's fine, but if there isn't, it needs a new test. Otherwise, there would be no test that shows that the error is properly handled in this case.

Same goes below.

llvm/test/tools/llvm-ar/big-archive-xcoff-align.test
23 ↗(On Diff #553655)

Rather than repeating this python snippet over and over again, could you write a little python script at the end of this file, use split-file to split it into a .py file at runtime, and then execute the file with all the different input arguments?

The RUN line would end up something like:

# RUN: %python print-magic.py 262 | FileCheck --check-prefix=MAGIC32
36–37 ↗(On Diff #553655)

Or:

"Test that the content of XCOFF object files, which don't have auxiliary headers, are aligned at 2 in a big archive."

54 ↗(On Diff #553655)

This is a near-duplicate of the previous YAML doc. Could you just parameterise the section name, like you do with the FLAG input parameter?

61 ↗(On Diff #553655)
73 ↗(On Diff #553655)
101 ↗(On Diff #553655)

I'm not sure what "but excess the page size" means. Should it be "but they exceed the page size" or something?

DiggerLin marked 6 inline comments as done.

address comment

DiggerLin marked an inline comment as done.Aug 29 2023, 10:53 AM
jhenderson added inline comments.Aug 30 2023, 12:47 AM
llvm/lib/Object/ArchiveWriter.cpp
838

Okay, thanks. I agree that this check on this line is covered by the existing test. However, archive-malformed-object.test doesn't cover the case where a second or later member is bad, I believe, which means that the check and report of the error at lines 841-843 in this current version of the patch aren't covered still.

Also, that test only covers it for the case where symbol tables are needed (i.e. when llvm-ar without the "S" option is used). Ideally, you would also have a test case that shows that when no symbols are requested (i.e. llvm-ar S<more options> test.a xcoff.o), an error is also reported for an invalid member of a big archive.

You could cover both missing cases sufficiently by adding the following two test cases somewhere:

llvm-ar rcS archive.a bad.o
llvm-ar rcS archive.a good.o bad.o

where archive.a is a big archive, bad.o is a file that will cause getSymbolicFile to fail, and good.o is a file that it will work with. (Obviously you'd rename these files in the real test)

llvm/test/tools/llvm-ar/big-archive-xcoff-align.test
56 ↗(On Diff #554424)

It looks like you missed that it should be "has neither" not "have neither"

97 ↗(On Diff #554424)
DiggerLin updated this revision to Diff 555069.Aug 31 2023, 8:54 AM
DiggerLin marked 3 inline comments as done.

added more test scenario and address comment.

DiggerLin added inline comments.Aug 31 2023, 8:56 AM
llvm/lib/Object/ArchiveWriter.cpp
838

Okay, thanks. I agree that this check on this line is covered by the existing test. However, archive-malformed-object.test doesn't cover the case where a second or later member is bad, I believe, which means that the check and report of the error at lines 841-843 in this current version of the patch aren't covered still.

I will add a new test scenario to cover it.

Ideally, you would also have a test case that shows that when no symbols are requested (i.e. llvm-ar S<more options> test.a xcoff.o), an error is also reported for an invalid member of a big archive.

what is the purpose the test ? do you want to test whether the code go into the following block
when isAIXBigArchive(Kind) of if (NeedSymbols != SymtabWritingMode::NoSymtab || isAIXBigArchive(Kind)) is true, Or test the functionality of the following block(we already has test scenario for it)

{auto SetNextSymFile = [&NextSymFile,
    .....
    if ((M + 1) != NewMembers.end())
        if (Error Err = SetNextSymFile((M + 1)->Buf->getMemBufferRef(),
                                       (M + 1)->MemberName))
          return std::move(Err);
    }

if we want to check whether going into above block, I think the align functionality will not work without the code go into the above block. we already has test scenario to test the align of big archive. I add the test scenario anyway.

jhenderson added inline comments.Sep 1 2023, 12:32 AM
llvm/lib/Object/ArchiveWriter.cpp
838

what is the purpose the test ?

Unlike the last few tests I've requested, which test very specific code paths, this is more of a high-level-thinking test (often called a "black box test"), i.e. one where we don't know what the details of the code are. Specifically, the aim of the test is to show that "For Big Archive, if an input object can't be loaded, we correctly report an error." Such high-level tests are useful in addition to the low-level coverage tests, because they are less likely to be invalidated by subsequent changes to the code (for example, if somebody changed this code path to be big archive only, because they decide to handle symbol tables differently, some of the test coverage provided by the existing test would no longer cover this area of code).

High-level tests can often provide low-level coverage at the same time, and similarly low-level coverage tests can cover high-level topics at the same time (e.g. the other test you've recently modified covers "For archives that require a symbol table, if an input object can't be loaded, we correctly report an error"). A consequence of this dual-purpose is that sometimes there is overlap in terms of raw coverage that tests provide, as you've seen here.

llvm/test/Object/archive-malformed-object.test
14

I don't think this test should use an XCOFF object. XCOFF is not a well-known format, so using it purely to provide a "good" first object will make it look like it has been chosen very deliberately, which confuses the purpose of the test. Better would be to have another bitcode object that isn't malformed as the first object. You can then have bad.bc and good.bc, which more clearly indicates what is important.

16

I'm pretty sure you don't want the S here? Compare this command to the one above. If you are using a non-big archive format (which will be the default on many people's systems), then the symbols are only loaded if S is not present. If the symbols aren't loaded, the bitcode file won't result in an error (I think).

DiggerLin marked 2 inline comments as done.Sep 1 2023, 10:00 AM
DiggerLin added inline comments.
llvm/test/Object/archive-malformed-object.test
16

I need to S to test your comment , the archive format always depend on the first object file of llvm-ar argument(no matter the OS). when the xcoff object file, the archive is big archive.

Ideally, you would also have a test case that shows that when no symbols are requested (i.e. llvm-ar S<more options> test.a xcoff.o), an error is also reported for an invalid member of a big archive.

DiggerLin updated this revision to Diff 555423.Sep 1 2023, 10:01 AM
DiggerLin marked an inline comment as done.

address comment

DiggerLin updated this revision to Diff 555424.Sep 1 2023, 10:03 AM

delete an empty line

jhenderson added inline comments.Sep 4 2023, 12:43 AM
llvm/test/Object/archive-malformed-object.test
16

Oh, okay I misremembered how the format selection works. Using --format=bigarchive is probably a good idea anyway for clarity.

My expectation with the suggestion you've quoted was that xcoff.o would be the file that couldn't be read, rather than your test case mixing bitcode and xcoff files. I think mixing formats might be a little bit confusing, so I'd either have one/two xcoff files (so the same pair of cases as the invalid bitcode with symbol table cases above, but using xcoff files), or even just the same as the previous two cases, but with S (and the --format=bigarchive option), using the same bitcode files. I have no particular preference (but it probably is a good idea to include both "first" and "not first" cases for completeness).

DiggerLin updated this revision to Diff 555917.Sep 5 2023, 12:55 PM

address comment

jhenderson accepted this revision.Sep 6 2023, 12:31 AM

LGTM, with 2 comment nits and one small test issue that should be addressed before merging.

llvm/test/Object/archive-malformed-object.test
21–23

Do we need to be explicit about gnu format? If the archive format is derived from the first member, this won't be big archvie, right? If we don't need to be explicit, please remove the --format option, so that it can capture more cases. If for whatever reason it would be bigarchive without the format option, it's fine to leave as-is.

In either case, I suggest changing the comment to say "not required for formats other than the big archive format." as it's not specifically gnu format here that's important.

llvm/test/tools/llvm-ar/big-archive-xcoff-align.test
38 ↗(On Diff #555917)

Nit, missed earlier.

68 ↗(On Diff #555917)
This revision is now accepted and ready to land.Sep 6 2023, 12:31 AM
DiggerLin marked an inline comment as done.Sep 6 2023, 8:08 AM
DiggerLin added inline comments.
llvm/test/Object/archive-malformed-object.test
21–23

input.o is malformed object file, the archive format will depend on the getDefaultKindForHost(). So the --format=gnu is need here.