This is an archive of the discontinued LLVM Phabricator instance.

[AIX] support 64bit global symbol table for big archive
ClosedPublic

Authored by DiggerLin on Jan 24 2023, 8:28 AM.

Details

Summary

In big archive , there is 32bit global symbol table and 64 bit global symbol table. llvm-ar only support 32bit global symbol table this moment, we need to support the 64 bit global symbol table.

https://www.ibm.com/docs/en/aix/7.2?topic=formats-ar-file-format-big

Global Symbol Tables

Immediately following the member table, the archive file contains two global symbol tables. The first global symbol table locates 32-bit file members that define global symbols; the second global symbol table does the same for 64-bit file members. If the archive has no 32-bit or 64-bit file members, the respective global symbol table is omitted. The strip command can be used to delete one or both global symbol tables from the archive. The fl_gstoff field in the fixed-length header contains the offset to the 32-bit global symbol table, and the fl_gst64off contains the offset to the 64-bit global symbol table.

Diff Detail

Event Timeline

DiggerLin created this revision.Jan 24 2023, 8:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 24 2023, 8:28 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
DiggerLin requested review of this revision.Jan 24 2023, 8:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 24 2023, 8:28 AM

I'm out of time for now. Please take a look at my inline comments and I'll come back to this at some point after you've addressed them (I have a big backlog to catch up on, so it may be a while).

llvm/include/llvm/Object/Archive.h
413

I think it's unlikely that this buffer is ever going to be "small", so it seems likely that it would just be better to store this a std::string ultimately.

llvm/lib/Object/Archive.cpp
1358

These new returns seem like they should be a separate patch. They aren't really anything to do with 64-bit global symbol table support.

1369
1387–1389

I would avoid long lambdas like this, as they make it hard to follow the code flow. Instead, I'd pull it into a separate function. You can have it return Error too, rather than passing in the error as a reference.

1407
1414
1415

This code all feels a bit too procedural. I think you could make things a little neater by introducing a small SymbolTableReader class. Something like the following rough outline:

class SymbolTableReader {
public:
  void readTable(...) {
    assert(MergedSymbolOffsets.empty() && MergedStringTables.empty()); // readTable should only be called before the getters.
    SymbolOffsets.push_back(...); // Add the offsets for this individual table
    StringTable.push_back(...); // Add the strings for this individual table.
  }

  StringRef getSymbolOffsetTable() const {
    if (SymbolOffsets.size() == 1)
      return SymbolOffsets[0];
    if (!MergedSymbolOffsets.empty())
      return MergedSymbolOffsets;

    uint64_t SymCount = 0;
    for (StringRef SymbolOffsetTable : SymbolOffsets)
      SymCount += /* read sym count */;
    raw_svector_ostream Stream(MergedSymbolOffsets);
    // Write the SymCount to the stream.
    // Loop over and write the symbol offsets to the stream.
    return MergedSymbolOffsets;
  }

  StringRef getStringTable() const {
    if (StringTables.size() == 1)
      return StringTables[0];
    if (!MergedStringTables.empty())
      return MergedStringTables;

    // Combine all the strings in StringTables and store in MergedStringTables
    return MergedStringTables;
  }

private:
  StringRefVector SymbolOffsets;
  StringRefVector StringTables;

  mutable std::string MergedSymbolOffsets;
  mutable std::string MergedStringTables;
};
1428

Again, I'd prefer this to be a separate function.

llvm/lib/Object/ArchiveWriter.cpp
470

The fact that there's another is64BitSymbolFile() should tell you that you need to reorganise some code to avoid code duplication.

DiggerLin marked 3 inline comments as done.Feb 1 2023, 12:22 PM
DiggerLin added inline comments.
llvm/include/llvm/Object/Archive.h
413

there are null characters between each symbol name to separate the symbol names in the global symbol table, I do not think we can use the std::string to store it.

and the base class of SmallString is SmallVector, According to the comment in the SmallVector.h, "SmallVectors are used for buffering bitcode output - which can exceed 4GB.",

llvm/lib/Object/ArchiveWriter.cpp
470

I createed a new NFC patch (https://reviews.llvm.org/D143097) which will add a new virtual API Is64bit() for SymbolicFile

DiggerLin marked 2 inline comments as done.Feb 1 2023, 12:23 PM

I'm out of time for now. Please take a look at my inline comments and I'll come back to this at some point after you've addressed them (I have a big backlog to catch up on, so it may be a while).

Thanks a lot for your time to review.

jhenderson added inline comments.Feb 2 2023, 12:53 AM
llvm/include/llvm/Object/Archive.h
413

there are null characters between each symbol name to separate the symbol names in the global symbol table, I do not think we can use the std::string to store it.

std::string is perfectly capable of storing null bytes. You just can't use the std::string(const char *) constructor to build it (you need to use one which takes a size too).

and the base class of SmallString is SmallVector, According to the comment in the SmallVector.h, "SmallVectors are used for buffering bitcode output - which can exceed 4GB.",

Have you read the Programmer's Manual? FWIW, I couldn't figure out from that which is the more appropriate to use. I guess in the end, it'll depend on how you end up populating it as to which is better.

DiggerLin updated this revision to Diff 494682.Feb 3 2023, 10:19 AM
DiggerLin marked 6 inline comments as done.

address comment and add bit code into test case.

llvm/lib/Object/Archive.cpp
1358

OK, I will do it.

1415

when class Archive , it need set two member

StringRef SymbolTable;
 StringRef StringTable;

for symbol iterating.

we do not need a variable "SymbolOffsets" here.

and for getStringTable() in your comment, also only use once for setting the "StringTable" of class Archive.

And the code use in my patch only 18 lines which including the merging of 32bit global symbol table and 64 bit global symbol table into a one and setting the two variable of Archive class (SymbolTable and StringTable), I think it is easy to understand.

but I get some idea from your comment to introduce the a new structure to reduce the variables.

DiggerLin updated this revision to Diff 500187.Feb 24 2023, 6:45 AM

fixed a bug

gentle ping.

gentle ping. if you have time , would you like to take sometime to review @jhenderson

Sorry for taking a long time to review this. Also, it will be important that somebody with AIX familiarity reviews this.

llvm/include/llvm/Object/Archive.h
413

See my comments elsewhere about naming.

llvm/lib/Object/Archive.cpp
1280

Use lower-case for function names (as stated before), and don't abbreviate things if the meaning can become ambiguous ("Glob" could refer to "Glob patterns" and is not obviously referring to Global).

Also, "Symtab" is a more common way of shortening "symbol table", so you may wish to change all your "SymTbl" names to that, although I'm less bothered by that one.

1299

You're being inconsistent with your "malformed AIX big archive" prefixes. You have it here, but not in the other error messages in the function.

1319

Either return a vector from the function, rather than pass it by reference, or change the function name to appendGlobalSymbolTableInfo or something to that effect, to better describe what it is doing.

1416–1418
1429

Why "8"? It would be better to avoid "magic numbers" like this. Give them names, or better yet directly use sizeof or similar to derive the value.

llvm/lib/Object/ArchiveWriter.cpp
455

I'd rename this to match the type you're fetching.

456

Move this variable declaration to where it is used for the first time.

458

I suggest this should be "Don't attempt to read non-symbolic file types."

475

I'm looking at this method again and I'm a little concerned that we end up potentially reading each member multiple times, which is not going to be a fast operation. I seem to remember (possibly in an unrelated patch - I've been looking at too many different ones to keep things straight) that you stored the SymbolicFile class as a member of the Archive member class, to avoid this issue elsewhere. Could you make use of that here, to avoid the additional reading in that this file?

485
543–544

To clarify, will getSymbol definitely always have been called before this point (no matter the options/contents of file etc)? If not, is it definitely always called afterwards?

546

Is there any code path that can hit this code when an error is present? If so, then you're going to get a crash on the line after this consumeError call, so you need to do something to fix that. If not, then this should be cantFail, not consumeError. The same goes for the consumeError call a little ways below.

645
899

Use "32-bit" when it is used as an adjective before the noun (as here). Use "32-bits" when it is not followed by a noun (e.g. "this file is 32-bits").

908–913

Some wording suggestions. Please reflow as appropriate.

I've added "may" before "contain", on the assumption that an AIX big archive with only 32-bit members won't contain a 64-bit symbol table. If it is always present, please delete the "may" from the first sentence.

1049
1063–1064

Field names in the comment don't match the variable names. Are the field names different?

1067–1068
1099
1104–1105

I think this comment would be a little clearer.

1107

The style elsewhere uses the suggested edit.

1110
DiggerLin updated this revision to Diff 517164.Apr 26 2023, 7:30 AM
DiggerLin marked 21 inline comments as done.

address comments, thanks James.

DiggerLin added inline comments.Apr 26 2023, 7:31 AM
llvm/include/llvm/Object/Archive.h
413

thanks

llvm/lib/Object/Archive.cpp
1299

there is " truncated or malformed archive" output for malformedError ,I delete the "malformed AIX big archive: " .

I will delete other "malformed AIX big archive:" in a separate patch.

llvm/lib/Object/ArchiveWriter.cpp
475

I only added two new members for struct MemberData

uint64_t PreHeadPadSize = 0;
 std::unique_ptr<SymbolicFile> SymFile = nullptr;

I think you maybe want to mention the comment https://reviews.llvm.org/D142479#inline-1379936

543–544

The function writeSymbolTable is called only when "WriteSymtab is true"

there is two places call the is64BitSymbolicFile(M.Data) before the function writeSymbolTable() be called.

  1. in function writeArchiveToStream()
if (isAIXBigArchive(Kind) && WriteSymtab) {
    Expected<bool> Is64BitOrErr = is64BitSymbolicFile(M.Data);
    if (Error E = Is64BitOrErr.takeError())
      return E;

    if (!*Is64BitOrErr)
      NumSyms32 += M.Symbols.size();
  }
  1. In computeMemberData
if (NeedSymbols) {
  Expected<std::vector<unsigned>> SymbolsOrErr =
      getSymbols(Buf, SymNames, HasObject);
  if (!SymbolsOrErr)
    return createFileError(M.MemberName, SymbolsOrErr.takeError());
  Symbols = std::move(*SymbolsOrErr);
}
546

The error should not be hit. I change to cantFail , thanks

899

thanks for explain.

908–913

I've added "may" before "contain", on the assumption that an AIX big archive with only 32-bit members won't contain a 64-bit symbol table.

Yes, AIX big archive may only has 32-bit global symbol table. thanks

1063–1064

there are same. the "GlobSymOffset" is the member of class "BigArchive"

    char GlobSymOffset[20];                  ///< Offset to global symbol table.
    char
        GlobSym64Offset[20]; ///< Offset global symbol table for 64-bit objects
```.
stephenpeckham added inline comments.
llvm/lib/Object/ArchiveWriter.cpp
1064–1065

No need to check NewMembers.size(). GlobalSymbolOffset will be 0 if there are no members.

DiggerLin marked an inline comment as done.May 11 2023, 1:20 PM
DiggerLin added inline comments.
llvm/lib/Object/ArchiveWriter.cpp
1064–1065

agree. thanks, I will modify it.

stephenpeckham accepted this revision.May 15 2023, 7:25 AM

All issues have been addressed

This revision is now accepted and ready to land.May 15 2023, 7:25 AM
This revision was landed with ongoing or failed builds.May 18 2023, 7:55 AM
This revision was automatically updated to reflect the committed changes.
DiggerLin marked an inline comment as done.Aug 15 2023, 8:27 AM
llvm/test/Object/bigarchive-symboltable.test