This is an archive of the discontinued LLVM Phabricator instance.

[AIX] support 64bit global symbol table for big archive
ClosedPublic

Authored by DiggerLin on Jan 24 2023, 8:28 AM.

Details

Summary

In big archive , there is 32bit global symbol table and 64 bit global symbol table. llvm-ar only support 32bit global symbol table this moment, we need to support the 64 bit global symbol table.

https://www.ibm.com/docs/en/aix/7.2?topic=formats-ar-file-format-big

Global Symbol Tables

Immediately following the member table, the archive file contains two global symbol tables. The first global symbol table locates 32-bit file members that define global symbols; the second global symbol table does the same for 64-bit file members. If the archive has no 32-bit or 64-bit file members, the respective global symbol table is omitted. The strip command can be used to delete one or both global symbol tables from the archive. The fl_gstoff field in the fixed-length header contains the offset to the 32-bit global symbol table, and the fl_gst64off contains the offset to the 64-bit global symbol table.

Diff Detail

Event Timeline

DiggerLin created this revision.Jan 24 2023, 8:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 24 2023, 8:28 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
DiggerLin requested review of this revision.Jan 24 2023, 8:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 24 2023, 8:28 AM

I'm out of time for now. Please take a look at my inline comments and I'll come back to this at some point after you've addressed them (I have a big backlog to catch up on, so it may be a while).

llvm/include/llvm/Object/Archive.h
408

I think it's unlikely that this buffer is ever going to be "small", so it seems likely that it would just be better to store this a std::string ultimately.

llvm/lib/Object/Archive.cpp
1234

These new returns seem like they should be a separate patch. They aren't really anything to do with 64-bit global symbol table support.

1245
1263–1265

I would avoid long lambdas like this, as they make it hard to follow the code flow. Instead, I'd pull it into a separate function. You can have it return Error too, rather than passing in the error as a reference.

1281
1288
1289

This code all feels a bit too procedural. I think you could make things a little neater by introducing a small SymbolTableReader class. Something like the following rough outline:

class SymbolTableReader {
public:
  void readTable(...) {
    assert(MergedSymbolOffsets.empty() && MergedStringTables.empty()); // readTable should only be called before the getters.
    SymbolOffsets.push_back(...); // Add the offsets for this individual table
    StringTable.push_back(...); // Add the strings for this individual table.
  }

  StringRef getSymbolOffsetTable() const {
    if (SymbolOffsets.size() == 1)
      return SymbolOffsets[0];
    if (!MergedSymbolOffsets.empty())
      return MergedSymbolOffsets;

    uint64_t SymCount = 0;
    for (StringRef SymbolOffsetTable : SymbolOffsets)
      SymCount += /* read sym count */;
    raw_svector_ostream Stream(MergedSymbolOffsets);
    // Write the SymCount to the stream.
    // Loop over and write the symbol offsets to the stream.
    return MergedSymbolOffsets;
  }

  StringRef getStringTable() const {
    if (StringTables.size() == 1)
      return StringTables[0];
    if (!MergedStringTables.empty())
      return MergedStringTables;

    // Combine all the strings in StringTables and store in MergedStringTables
    return MergedStringTables;
  }

private:
  StringRefVector SymbolOffsets;
  StringRefVector StringTables;

  mutable std::string MergedSymbolOffsets;
  mutable std::string MergedStringTables;
};
1302

Again, I'd prefer this to be a separate function.

llvm/lib/Object/ArchiveWriter.cpp
419

The fact that there's another is64BitSymbolFile() should tell you that you need to reorganise some code to avoid code duplication.

DiggerLin marked 3 inline comments as done.Feb 1 2023, 12:22 PM
DiggerLin added inline comments.
llvm/include/llvm/Object/Archive.h
408

there are null characters between each symbol name to separate the symbol names in the global symbol table, I do not think we can use the std::string to store it.

and the base class of SmallString is SmallVector, According to the comment in the SmallVector.h, "SmallVectors are used for buffering bitcode output - which can exceed 4GB.",

llvm/lib/Object/ArchiveWriter.cpp
419

I createed a new NFC patch (https://reviews.llvm.org/D143097) which will add a new virtual API Is64bit() for SymbolicFile

DiggerLin marked 2 inline comments as done.Feb 1 2023, 12:23 PM

I'm out of time for now. Please take a look at my inline comments and I'll come back to this at some point after you've addressed them (I have a big backlog to catch up on, so it may be a while).

Thanks a lot for your time to review.

jhenderson added inline comments.Feb 2 2023, 12:53 AM
llvm/include/llvm/Object/Archive.h
408

there are null characters between each symbol name to separate the symbol names in the global symbol table, I do not think we can use the std::string to store it.

std::string is perfectly capable of storing null bytes. You just can't use the std::string(const char *) constructor to build it (you need to use one which takes a size too).

and the base class of SmallString is SmallVector, According to the comment in the SmallVector.h, "SmallVectors are used for buffering bitcode output - which can exceed 4GB.",

Have you read the Programmer's Manual? FWIW, I couldn't figure out from that which is the more appropriate to use. I guess in the end, it'll depend on how you end up populating it as to which is better.

DiggerLin updated this revision to Diff 494682.Feb 3 2023, 10:19 AM
DiggerLin marked 6 inline comments as done.

address comment and add bit code into test case.

llvm/lib/Object/Archive.cpp
1234

OK, I will do it.

1289

when class Archive , it need set two member

StringRef SymbolTable;
 StringRef StringTable;

for symbol iterating.

we do not need a variable "SymbolOffsets" here.

and for getStringTable() in your comment, also only use once for setting the "StringTable" of class Archive.

And the code use in my patch only 18 lines which including the merging of 32bit global symbol table and 64 bit global symbol table into a one and setting the two variable of Archive class (SymbolTable and StringTable), I think it is easy to understand.

but I get some idea from your comment to introduce the a new structure to reduce the variables.

DiggerLin updated this revision to Diff 500187.Feb 24 2023, 6:45 AM

fixed a bug

gentle ping.

gentle ping. if you have time , would you like to take sometime to review @jhenderson

Sorry for taking a long time to review this. Also, it will be important that somebody with AIX familiarity reviews this.

llvm/include/llvm/Object/Archive.h
408

See my comments elsewhere about naming.

llvm/lib/Object/Archive.cpp
1171

Use lower-case for function names (as stated before), and don't abbreviate things if the meaning can become ambiguous ("Glob" could refer to "Glob patterns" and is not obviously referring to Global).

Also, "Symtab" is a more common way of shortening "symbol table", so you may wish to change all your "SymTbl" names to that, although I'm less bothered by that one.

1190

You're being inconsistent with your "malformed AIX big archive" prefixes. You have it here, but not in the other error messages in the function.

1210

Either return a vector from the function, rather than pass it by reference, or change the function name to appendGlobalSymbolTableInfo or something to that effect, to better describe what it is doing.

1290–1292
1303

Why "8"? It would be better to avoid "magic numbers" like this. Give them names, or better yet directly use sizeof or similar to derive the value.

llvm/lib/Object/ArchiveWriter.cpp
404

I'd rename this to match the type you're fetching.

405

Move this variable declaration to where it is used for the first time.

407

I suggest this should be "Don't attempt to read non-symbolic file types."

424

I'm looking at this method again and I'm a little concerned that we end up potentially reading each member multiple times, which is not going to be a fast operation. I seem to remember (possibly in an unrelated patch - I've been looking at too many different ones to keep things straight) that you stored the SymbolicFile class as a member of the Archive member class, to avoid this issue elsewhere. Could you make use of that here, to avoid the additional reading in that this file?

434
470–471

To clarify, will getSymbol definitely always have been called before this point (no matter the options/contents of file etc)? If not, is it definitely always called afterwards?

473

Is there any code path that can hit this code when an error is present? If so, then you're going to get a crash on the line after this consumeError call, so you need to do something to fix that. If not, then this should be cantFail, not consumeError. The same goes for the consumeError call a little ways below.

509
730

Use "32-bit" when it is used as an adjective before the noun (as here). Use "32-bits" when it is not followed by a noun (e.g. "this file is 32-bits").

739–744

Some wording suggestions. Please reflow as appropriate.

I've added "may" before "contain", on the assumption that an AIX big archive with only 32-bit members won't contain a 64-bit symbol table. If it is always present, please delete the "may" from the first sentence.

835
849–850

Field names in the comment don't match the variable names. Are the field names different?

866–867
914
919–920

I think this comment would be a little clearer.

922

The style elsewhere uses the suggested edit.

925
DiggerLin updated this revision to Diff 517164.Apr 26 2023, 7:30 AM
DiggerLin marked 21 inline comments as done.

address comments, thanks James.

DiggerLin added inline comments.Apr 26 2023, 7:31 AM
llvm/include/llvm/Object/Archive.h
408

thanks

llvm/lib/Object/Archive.cpp
1190

there is " truncated or malformed archive" output for malformedError ,I delete the "malformed AIX big archive: " .

I will delete other "malformed AIX big archive:" in a separate patch.

llvm/lib/Object/ArchiveWriter.cpp
424

I only added two new members for struct MemberData

uint64_t PreHeadPadSize = 0;
 std::unique_ptr<SymbolicFile> SymFile = nullptr;

I think you maybe want to mention the comment https://reviews.llvm.org/D142479#inline-1379936

470–471

The function writeSymbolTable is called only when "WriteSymtab is true"

there is two places call the is64BitSymbolicFile(M.Data) before the function writeSymbolTable() be called.

  1. in function writeArchiveToStream()
if (isAIXBigArchive(Kind) && WriteSymtab) {
    Expected<bool> Is64BitOrErr = is64BitSymbolicFile(M.Data);
    if (Error E = Is64BitOrErr.takeError())
      return E;

    if (!*Is64BitOrErr)
      NumSyms32 += M.Symbols.size();
  }
  1. In computeMemberData
if (NeedSymbols) {
  Expected<std::vector<unsigned>> SymbolsOrErr =
      getSymbols(Buf, SymNames, HasObject);
  if (!SymbolsOrErr)
    return createFileError(M.MemberName, SymbolsOrErr.takeError());
  Symbols = std::move(*SymbolsOrErr);
}
473

The error should not be hit. I change to cantFail , thanks

730

thanks for explain.

739–744

I've added "may" before "contain", on the assumption that an AIX big archive with only 32-bit members won't contain a 64-bit symbol table.

Yes, AIX big archive may only has 32-bit global symbol table. thanks

849–850

there are same. the "GlobSymOffset" is the member of class "BigArchive"

    char GlobSymOffset[20];                  ///< Offset to global symbol table.
    char
        GlobSym64Offset[20]; ///< Offset global symbol table for 64-bit objects
```.
stephenpeckham added inline comments.
llvm/lib/Object/ArchiveWriter.cpp
878–879

No need to check NewMembers.size(). GlobalSymbolOffset will be 0 if there are no members.

DiggerLin marked an inline comment as done.May 11 2023, 1:20 PM
DiggerLin added inline comments.
llvm/lib/Object/ArchiveWriter.cpp
878–879

agree. thanks, I will modify it.

stephenpeckham accepted this revision.May 15 2023, 7:25 AM

All issues have been addressed

This revision is now accepted and ready to land.May 15 2023, 7:25 AM
This revision was landed with ongoing or failed builds.May 18 2023, 7:55 AM
This revision was automatically updated to reflect the committed changes.
DiggerLin marked an inline comment as done.Aug 15 2023, 8:27 AM
llvm/test/Object/bigarchive-symboltable.test