This is an archive of the discontinued LLVM Phabricator instance.

[AIX] supporting -X options for llvm-ranlib in AIX OS
ClosedPublic

Authored by DiggerLin on Jan 26 2023, 1:04 PM.

Details

Summary

llvm-ar is symlinked as llvm-ranlib and will act as ranlib when invoked in that mode. llvm-ar since compiler/llvm-project@4f2cfbe supports the -X options, but doesn't seem to accept them when running as llvm-ranlib.

In AIX OS , according to https://www.ibm.com/docs/en/aix/7.2?topic=r-ranlib-command

-X mode Specifies the type of object file ranlib should examine. The mode must be one of the following:

32

Processes only 32-bit object files

64

Processes only 64-bit object files

32_64

Processes both 32-bit and 64-bit object files

The default is to process 32-bit object files (ignore 64-bit objects). The mode can also be set with the OBJECT_MODE environment variable. For example, OBJECT_MODE=64 causes ranlib to process any 64-bit objects and ignore 32-bit objects. The -X flag overrides the OBJECT_MODE variable.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
DiggerLin retitled this revision from [AIX] llvm-ranlib accept -X options when acting as in AIX OS to [AIX] llvm-ranlib accept -X options in AIX OS.Jan 26 2023, 1:09 PM
DiggerLin retitled this revision from [AIX] llvm-ranlib accept -X options in AIX OS to [AIX] supportting -X options for llvm-ranlib in AIX OS.Jan 26 2023, 3:37 PM
DiggerLin retitled this revision from [AIX] supportting -X options for llvm-ranlib in AIX OS to [AIX] supporting -X options for llvm-ranlib in AIX OS.

I'm not sure I understand why the logic for llvm-ranlib seems to be significantly different to how the option is implemented for llvm-ar? Could you explain this, please?

llvm/tools/llvm-ar/llvm-ar.cpp
73–74

According to the AIX ranlib docs, this -X option "Specifies the type of object file ranlib should examine." This doesn't really line up with the description you've used here.

DiggerLin marked an inline comment as done.EditedFeb 3 2023, 2:05 PM

I'm not sure I understand why the logic for llvm-ranlib seems to be significantly different to how the option is implemented for llvm-ar? Could you explain this, please?

For llvm-ranlib , if big_archive do not have any global symbol table, it generate global symbol based on the -X option. if there is any global symbol table(no matter 32-bit or 64-bit), it do not any thing. (please see the code ,https://github.com/llvm/llvm-project/blob/main/llvm/tools/llvm-ar/llvm-ar.cpp#L1076 , but in AIX 'ranlib -X32' , if there is not a 32-bit global symbol table, AIX 'ranlib -X32' will continue to generate a 32-bit global symbol table even there is 64-bit global symbol table. I will discuss with my team , whether do I need to implement some behavior as AIX 'ranlib` )

For llvm-ar (and AIX 'ar'), the -X option only work on a object file in the command line which will be added replaced, deleted or moved,. The big archive may has the 64-bit object files and 32-bit object files. it will be generated the global symbol table for on the all the object files in the big archive(-X do not apply when generate the global symbol table). there maybe has 32-bit global symbol table and 64-bit global symbol table at the same time(aix 'ar' has the same behaviors)

llvm/tools/llvm-ar/llvm-ar.cpp
73–74

yes. any suggestion on the description ?

DiggerLin updated this revision to Diff 495270.Feb 6 2023, 2:05 PM
DiggerLin marked an inline comment as done.

In my previous implementation, It will only work on big archives with an empty global symbol table. if there is already a global symbol table (regardless of whether it's 32-bit or 64-bit) in the big archive, then llvm-ranlib -X will not do anything. (For example, if a big archive has a 64-bit global symbol table, then llvm-ranlib -X32 will not do anything). This is different from the AIX ranlib, where if there is only a 64-bit global symbol table, then -X32 will generate a 32-bit global symbol table, resulting in both the 32-bit and 64-bit global symbol tables being present in the big archive at the same time. I added the new functionality as AIX OS "ranlib"

jhenderson added inline comments.Mar 30 2023, 12:40 AM
llvm/include/llvm/Object/Archive.h
414

As commented in other patches, I'd not abbreviate "Global" to "Glob". Just use "Global" in your variable and function names.

llvm/tools/llvm-ar/llvm-ar.cpp
73–74

Well I'd expect it to be similar to the AIX ranlib doc?

Also, please remember to include spaces before opening parentheses in English prose.

231

Maybe I'm missing something, but I don't see why you need to make this a custom type. You already have the BitMode value that you read in from the environment/command-line, and so you can just use that in conjunction with the Symtab boolean to determine what symbol table to write, surely?

1084–1085

Just to make sure I'm following what this is trying to say: for Big Archives, a -X64 specification will ignore the 32-bit symbol table and generate a 64-bit one (the 32-bit one will remain in place, if present), and using -X32 will cause a 32-bit symbol table to be generated, right?

Assuming that's the case, I suggest the following as the comment:

"For archives in the Big Archive format, the bit mode option specifies which symbol table to generate. The presence of a symbol table that does not match the specified bit mode does not prevent creation of the symbol table that has been requested."

1088

Personally, I'd say that this assert is unnecessary, but if you want to keep it, that's fine.

1417

I think you can share this code with the llvm-ar version of the code (put it in a function). You can either add an "AcceptAny" member to that function's argument, or simply check if the value is "Any" after the function here.

1419

Does it treat "any" as "32" or does it report an error? If the latter, should we not do the same here?

1434

Similar to above, could we refactor things slightly to share the code for parsing the -X option between llvm-ranlib and llvm-ar?

1441

Delete this blank line.

DiggerLin updated this revision to Diff 517638.EditedApr 27 2023, 10:49 AM
DiggerLin marked 9 inline comments as done.

address comment and set option -Xany as invalid option and environment OBJECT-MODE=any as invalid too(let it has same behaviors' as "ranlib" of AIX OS).

-bash-5.0$ export OBJECT_MODE=31
-bash-5.0$ ranlib  t_X32.a
0654-603 The OBJECT_MODE environment variable has an invalid setting.
        OBJECT_MODE must be 32, 64, or 32_64.
-bash-5.0$ ranlib  -X31 t_X32.a
0654-602 The specified object mode is not valid.
        Specify -X32, -X64, or -X32_64.
-bash-5.0$ export OBJECT_MODE=any
-bash-5.0$ ranlib  t_X32.a
0654-603 The OBJECT_MODE environment variable has an invalid setting.
        OBJECT_MODE must be 32, 64, or 32_64.
-bash-5.0$  ranlib  -Xany nosym.a
0654-602 The specified object mode is not valid.
        Specify -X32, -X64, or -X32_64.
-bash-5.0$  ranlib  -X32 t_X32.a
-bash-5.0$
Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2023, 10:49 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
DiggerLin added inline comments.Apr 27 2023, 10:49 AM
llvm/tools/llvm-ar/llvm-ar.cpp
231

the Symtab are use to specify whether the symbol table need to write(for non-AIX). and what the symbol table need to write for AIX OS.

there are function like

writeArchiveToBuffer(ArrayRef<NewArchiveMember> NewMembers,
                   WriteSymTabType WriteSymtab, object::Archive::Kind Kind,
                   bool Deterministic, bool Thin)

and

Error writeArchive(StringRef ArcName, ArrayRef<NewArchiveMember> NewMembers,
                   WriteSymTabType WriteSymtab, object::Archive::Kind Kind,
                   bool Deterministic, bool Thin,
                   std::unique_ptr<MemoryBuffer> OldArchiveBuf = nullptr);

call the function as

writeArchiveToBuffer(M.second.getMembers(),
                     /*WriteSymtab=*/true,
                     /*Kind=*/object::Archive::K_DARWIN,
                     C.Deterministic,
                     /*Thin=*/false);

I do not want to change the calling parameter of the /*WriteSymtab=*/true,

I introduced a new class WriteSymTabType which has following API

WriteSymTabType(bool PrintSym) { Value = PrintSym ? Both : None; }
 void operator=(bool PrintSym) { Value = PrintSym ? Both : None; }
 operator bool() { return Value != None; }
1084–1085

yes, you are correct. I change the comment to your suggest.

1417

In AIX OS ,since ranlib do not support -Xany. it is different to share code with llvm-ar

-bash-5.0$ ranlib -Xany
0654-602 The specified object mode is not valid.

Specify -X32, -X64, or -X32_64.
1419

-Xany , I will report an error.

1434

the logic of parsing option is different with ar_main, it is difficult to share the code. and the llvm-ranlib do not support -Xany, and there only a few lines of duplication code, I do not think it worth us to put effort to share the code.

Do the -U and -D flags have any effect on the behavior of llvm-ranlib?

llvm/test/tools/llvm-ranlib/aix-X-option.test
17

what about OBJECT_MODE= (defined, but empty value)

llvm/tools/llvm-ar/llvm-ar.cpp
73

I think the AIX documentation for ranlib isn't as helpful as it could be. I actually like a variation of the original message better:

"-X {32|64|32_64} - Specifies which archive symbol tables should be generated if they do not already exist (AIX OS only)\n"

This implies that a 32-bit (64-bit) global symbol table is generated by examining XCOFF32 (XCOFF64) members.

But this wording doesn't really fit with the command description: Generate an index for archives. Should this be "Generate an index or symbol tables for archives"? Or just "Generate symbol tables for archives"? The usage message for llvm-ar also mixes "index" and "symbol table"

121

"Index" or "symbol table"? See the related comment about the usage message for "ranlib".

1274

AIX commands differentiate between OBJECT_MODE='' (an empty string) and OBJECT_MODE not defined. This function treats them the same way.

-X '' (an empty string) should also be an error. I would return Unknown for case "". For the Default case, if RawBitMode is NULL, Bit32 should be returned.

DiggerLin updated this revision to Diff 522625.May 16 2023, 8:11 AM
DiggerLin marked 4 inline comments as done.
DiggerLin added a reviewer: stephenpeckham.

address Stephen's comment, thanks Stephen.

DiggerLin added inline comments.May 16 2023, 8:12 AM
llvm/test/tools/llvm-ranlib/aix-X-option.test
17

I added the test case , AIX ranlib treats OBJECT_MODE= (defined, but empty value) as 32-bit object mode, I implement llvm-ranlib same AIX ranlib

llvm/tools/llvm-ar/llvm-ar.cpp
73

I think the llvm-ranlib generates the global symbol table, index is to general. if we want to change the description, Maybe the "Generate symbol tables for archives" is better and we should create a separate patch for it. what do you think.@jhenderson

121

if we want to change to "symbol table" , we need a separate NFC for it. thanks

1274

AIX commands differentiate between OBJECT_MODE='' (an empty string) and OBJECT_MODE not defined. This function treats them the same way.

I tried AIX ranlib command , it looks both OBJECT_MODE='' (an empty string) and "OBJECT_MODE not defined" as 32-bit by default.

X '' (an empty string) should also be an error. I would return Unknown for case "".

I will fix it , thanks

I haven't looked again at the rest of this patch. I'll do so hopefully in the next couple of weeks.

llvm/tools/llvm-ar/llvm-ar.cpp
73

So a quick look at GNU ranlib shows it uses the term "index". This makes a degree of sense: the "symbol table" is simply a map between symbol name and which archive member it is in, much like at the end of a book is from topics to pages. In other words "index" and "symbol table" mean the same thing in this context and can be used interchangeably.

Given that GNU uses the "index" terminology, I don't think we should change the main llvm-ranlib description. I'm happy for "symbol table" or "index" to be used in the help text here for this option.

DiggerLin updated this revision to Diff 523759.May 19 2023, 7:24 AM

rebase to latest master after the patch https://reviews.llvm.org/D142479 [AIX] support 64bit global symbol table for big archive landed.

My apologies, this patch fell off my review queue somehow.

I still have concerns about the option-parsing logic being duplicated, but I'm out of time to review it now. I'll try to look tomorrow.

llvm/test/tools/llvm-ranlib/aix-X-option.test
17
18

Doesn't this line want an llvm-nm check after it, like the others?

32
49
56

It would be better to move this block of test cases below the check patterns that are used for the valid cases.

63

Nit: here and throughout - check patterns usually have a space between '#' and the pattern name, e.g. "# GLOB32:"

Rather than the somewhat confusing GL64 versus GLOB64 (and similiarly for the 32-bit case), how about "HAS64" and "NO64" (and "HAS32" and "NO32")? That being said, see my comment below about ordering...

68

I'm concerned you're going to have an ordering issue in one of the 64 bit or 32 bit "NOT" cases. FileCheck -NOT patterns only check the region between the previous non-NOT pattern before the -NOT one up to the next non-NOT pattern (or the end of the file). So, as things stand, --check-prefixes=GLOB32,GL64 will verify that the 32-bit table appears and then no 64-bit table is printed AFTER the 32-bit table, but not whether the 64-bit table appears BEFORE the 32-bit table. Similarly, --check-prefixes=GLOB64,GL32 will check that the 64-bit table is printed and then no 32-bit table is printed after it, but not whether the 32-bit table is printed before it.

Given that the names of the symbols are irrelevant to this test case, and really all that's interesting is which object they're in (since this test isn't about checking that the symbol tables have the correct contents - that is the duty of a test that doesn't make use of the -X option), could you simplify the input objects to have a single like-named symbol and then simply ensure the correct object names appear in the output? You could use --implicit-check-not to check that e.g. "in t64" or "in t32" don't appear in your output.

If you adopted this approach, you'd have two CHECK lines as e.g. follows:

# GLOB32:     symbol1 in t32_1.o
# GLOB32-NEXT: symbol2 in t32_2.o

# GLOB64:     symbol1 in t64_1.o
# GLOB64-NEXT: symbol2 in t64_2.o

And you'd have FileCheck lines that look like one of the following:

FileCheck --check-prefixes=GLOB32 --implicit-check-not="in t64"
FileCheck --check-prefixes=GLOB64 --implicit-check-not="in t32"
FileCheck --check-prefixes=GLOB32,GLOB64

By simplifying down to one symbol each, you could also pass the symbol name in as a yaml2obj -D value and only have one YAML file in the test.

83–86

These aren't used. Delete them.

88

"value" would be better than "setting" (environment variables have values, not settings).

Please also re-review the coding standards regarding error messages and fix the two separate issues in this and the below error message.

llvm/tools/llvm-ar/llvm-ar.cpp
73–74

Also, please remember to include spaces before opening parentheses in English prose.

The "English prose" bit was referring to the description, not how the "-X {32|64|32_64}" bit was printed. Feel free to remove the space like you had before in this case (i.e. either "-X{32|64|32_64}" or "-X {32|64|32_64}" or acceptable).

74–75

Please reflow this string. That should make it fairly obvious that there's a mistake in it too (hint: print the help and spot the missing space...).

231

Having looked at this again, I really don't like the implicit conversion semantics of the new class, and I don't see why you can't just extend the writeArchiveToBuffer etc parameters to pass in a BitMode parameter.

1084–1085

The comment doesn't seem to have been changed to my suggestion.

1415

"AIX" looks like it is usually written all-upper-case, so the variable name should be "HasAIXXOption".

jhenderson added inline comments.Jul 19 2023, 12:54 AM
llvm/test/tools/llvm-ranlib/aix-X-option.test
4

Nit: trailing whitespace

llvm/tools/llvm-ar/llvm-ar.cpp
1396

Unrelated to this patch, but I spotted this when comparing the llvm-ar and llvm-ranlib parsing logic: what happens if -X is the very final argument, without a value? E.g. llvm-ar (any other args...) -X?

1439

If I'm reading this correctly, llvm-ranlib will accept "-X32" but reject "-X 32". Is that intentional? If so, what's the motivation behind it (I would say that there needs to be a motivation other than "that's what AIX ranlib does)?

1444

I know that AIX ranlib doesn't accept "any" whereas I believe AIX ar does. I wonder though what the benefit is of preventing llvm-ranlib from accepting "any"? Dropping that special case would simplify the code.

1468

Is there a particular reason that this is after the command-line option parsing for llvm-ranlib, but before it in llvm-ar? If this were before the option parsing, you wouldn't need the HasAixXOption variable.

1474–1476

This is an inconsistency with llvm-ar's current behaviour: llvm-ar treats Unknown as 32, whereas here you're outright rejecting it. I'm not convinced this inconsistency makes sense, nor do I see it benefiting the user. In my opinion the two should do the same thing. I think a garbage string should be outright rejected in both cases. An empty string for the environment variable or command-line option should probably be rejected too, but I'm okay with it being accepted, if AIX ranlib behaves that way. However, I think llvm-ranlib and llvm-ar should do the same whatever that is.

Once these inconsistencies have been ironed out, I think it'll be a lot simpler to share code between the two tools.

DiggerLin marked 14 inline comments as done.EditedJul 19 2023, 2:07 PM

I still have concerns about the option-parsing logic being duplicated, but I'm out of time to review it now. I'll try to look tomorrow.

In AIX OS , nm and ar , do not accept invalid env var OBJECT_MODE, for example, I will create two patch to fix the behavior of llvm-nm and llvm-ar (they look invalid env var OBJECT_MODE as default 32bit currently)

bash-5.0$ env OBJECT_MODE=31 nm   d_default.o
0654-216 nm: The OBJECT_MODE environment variable has an invalid setting.
        OBJECT_MODE must be 32, 64, 32_64, d64 or any.
-bash-5.0$ env OBJECT_MODE=31 ar  -q tmpk.a d_default.o
ar: 0707-133 The OBJECT_MODE environment variable has an invalid setting.
        OBJECT_MODE must be 32, 64, 32_64, d64, or any.
llvm/test/tools/llvm-ranlib/aix-X-option.test
18

yes, thanks

68

thanks for detail explain, I changed as your suggestion.

llvm/tools/llvm-ar/llvm-ar.cpp
74–75

thanks

231

without implicit conversion, if I add a new parameter BitMode to writeArchiveToBuffer, I also need to add the new parameter BitMode to function writeArchive() and writeArchiveToStream() and there is a lot of place(files) call the function writeArchive, I need to modify all the caller too.

DiggerLin marked 10 inline comments as done.Jul 19 2023, 2:37 PM
DiggerLin added inline comments.
llvm/tools/llvm-ar/llvm-ar.cpp
1396
-bash-5.0$ /home/zhijian/llvm/dev/build/bin/llvm-ar -X  tmpk.a d_default.o

/home/zhijian/llvm/dev/build/bin/llvm-ar: error: invalid bit mode: tmpk.a


-bash-5.0$ ar -X tmpk.a d_default.o
ar: 0707-132 The specified object mode is not valid.
        Specify -X32, -X64, -X32_64, -Xd64, or -Xany

-bash-5.0$llvm-ranlib -X  tmpk.a d_default.o
llvm-ranlib: error: the specified object mode is not valid. Specify -X32, -X64, or -X32_64

-bash-5.0$ ranlib -X  tmpk.a d_default.o
0654-602 The specified object mode is not valid.
        Specify -X32, -X64, or -X32_64.

I will change the behaviors of llvm-ar in a new separate patch.

1439

thanks. add functionality to accept -X 32

1444

agree with you. but we discussed about whether to accept any in our internal , we decide to keep all the behavior as AIX ranlib

1468

in AIX OS ranlib has behavior as

-bash-5.0$ env OBJECT_MODE=31 ranlib tmpk.a
0654-603 The OBJECT_MODE environment variable has an invalid setting.
        OBJECT_MODE must be 32, 64, or 32_64.
-bash-5.0$ env OBJECT_MODE=31 ranlib -X32  tmpk.a
-bash-5.0$

Given invalid env OBJECT_MODE , if there is no -X option in the ranlib command, it will output as

0654-603 The OBJECT_MODE environment variable has an invalid setting.
        OBJECT_MODE must be 32, 64, or 32_64.

Given invalid env OBJECT_MODE , and there is -X option in the ranlib command, it do not care about the invalid env OBJECT_MODE, So I has to parse the -X option before the getBitMode(getenv("OBJECT_MODE"))

1474–1476

In AIX OS , nm and ar , do not accept invalid env var OBJECT_MODE, for example, I will create two separate patches to fix the behavior of llvm-nm and llvm-ar (they look invalid env var OBJECT_MODE as default 32bit)

bash-5.0$ env OBJECT_MODE=31 nm   d_default.o
0654-216 nm: The OBJECT_MODE environment variable has an invalid setting.
        OBJECT_MODE must be 32, 64, 32_64, d64 or any.
-bash-5.0$ env OBJECT_MODE=31 ar  -q tmpk.a d_default.o
ar: 0707-133 The OBJECT_MODE environment variable has an invalid setting.
        OBJECT_MODE must be 32, 64, 32_64, d64, or any.
DiggerLin updated this revision to Diff 542209.Jul 19 2023, 2:56 PM
DiggerLin marked 5 inline comments as done.

address comment

DiggerLin updated this revision to Diff 542660.EditedJul 20 2023, 2:05 PM
  1. rebase the code
  2. for the env OBJECT_MODE= or is env OBJECT_MODE="" , the ranlib and ar in AIX OS , have different behaviors(ranlib looks as -X32 , but ar and nm output error ),

I change llvm-ranlib as the same behavior as ar and llvm-ar, output an error(I think output error is more reasonable). and look env OBJECT_MODE unset as default 32-BIT for llvm-ranlib as AIX OS command ranlib , nm ,ar.

bash-5.0$ env OBJECT_MODE= ar -q tmp.a bug.patch
ar: 0707-133 The OBJECT_MODE environment variable has an invalid setting.
OBJECT_MODE must be 32, 64, 32_64, d64, or any.
bash-5.0$ env OBJECT_MODE="" ar -q tmp.a bug.patch
ar: 0707-133 The OBJECT_MODE environment variable has an invalid setting.
OBJECT_MODE must be 32, 64, 32_64, d64, or any.
bash-5.0$ env OBJECT_MODE="" ranlib tmpk.a
bash-5.0$ env OBJECT_MODE= ranlib tmpk.a

jhenderson added inline comments.Jul 25 2023, 1:24 AM
llvm/test/tools/llvm-ranlib/aix-X-option.test
18–19

Assuming the unsetting is intended to be just for the llvm-ranlib line, I believe the preferred approach is env -u OBJECT_MODE llvm-ranlib .... That way, you don't impact the behaviour in subsequent lines.

32

Marked as done but not addressed? Please don't do that...

jhenderson added inline comments.Jul 25 2023, 1:24 AM
llvm/tools/llvm-ar/llvm-ar.cpp
74–75

You didn't actually reflow the string. I believe the following is more correct (I haven't checked the column width, so make sure to run clang-format afterwards:

"should be generated if they do not already exist (AIX OS only)\n";
231

I also need to add the new parameter BitMode to function writeArchive() and writeArchiveToStream() and there is a lot of place(files) call the function writeArchive, I need to modify all the caller too.

I don't think this is a good reason not to do this, given that it results in an (in my opinion) suboptimal interface. That being said, you could move the new parameter I'm suggesting to the end of the argument list and use a default value, if it helps reduce the impact.

1444

we decide to keep all the behavior as AIX ranlib

We aren't in your internal company here. This is the open source community, therefore you need to be able to justify your decisions in the open source conversation. What reason is there to keep rejecting this in llvm-ranlib? Perhaps worth asking yourself is "if we could control it, why would we keep that behaviour in AIX ranlib?".

1468

So with AIX ar, does an invalid OBJECT_MODE environment variable get reported if the -X option is specified?

In my opinion, I think it is very unlikely there will be any real users out there with an invalid OBJECT_MODE environment variable, because other tools will reject it, even if ranlib doesn't. Even if there are, they should be able to easily fix their variable, if they start getting an error message after switching to llvm-ranlib. I'm therefore thinking that there isn't a need to mirror this logic in llvm-ranlib, if it would make the code simpler (which it would).

DiggerLin marked an inline comment as done.Jul 25 2023, 6:47 AM
DiggerLin added inline comments.
llvm/tools/llvm-ar/llvm-ar.cpp
231

if I add a need BitMode enum
{None,
Bit32,
Bit64,
Bit32_64
};

Do you think we still need the parameter "bool WriteSymtab" , I think we can use the BitMode=None to imply WriteSymtab=false, using BitMode!=None imply WriteSymtab=true. @jhenderson

DiggerLin marked 2 inline comments as done.Jul 25 2023, 10:10 AM
DiggerLin added inline comments.
llvm/tools/llvm-ar/llvm-ar.cpp
1444

according to
https://www.ibm.com/docs/en/aix/7.1?topic=ar-command

-X mode , there is 32, 64, 32_64, d64, any mode

d64
Examines discontinued 64-bit XCOFF files (magic number == U803XTOCMAGIC).

we do not support d64in llvm(since it is discontinued), but we keep any option in llvm-ar in case of we want to use llvm-ar to replace AIX ar in some AIX shell script which has option any for ar (any = 32_64 + d64), we do no want to modify the option from any to 32_64 for AIX shell script, so we keep the any option for llvm-ar.

for AIX ranlib, https://www.ibm.com/docs/en/aix/7.2?topic=r-ranlib-command . it only support 32,64,32_64, It do not support d64, so there is no any option for AIX ranlib, we do not need to add a additional any for llvm-ranlib

DiggerLin marked 3 inline comments as done.Jul 25 2023, 10:47 AM
DiggerLin added inline comments.
llvm/tools/llvm-ar/llvm-ar.cpp
1468

So with AIX ar, does an invalid OBJECT_MODE environment variable get reported if the -X option is specified?

it do no report invalid OBJECT_MODE environment if the -X option is specified.

bash-5.0$ export OBJECT_MODE=31
bash-5.0$ ar  q tmpp.acheck.o
ar: 0707-133 The OBJECT_MODE environment variable has an invalid setting.
        OBJECT_MODE must be 32, 64, 32_64, d64, or any.
bash-5.0$ ar -X32 q tmpp.acheck.o
bash-5.0$

In my opinion, I think it is very unlikely there will be any real users out there with an invalid OBJECT_MODE environment variable, because other tools will reject it, even if ranlib doesn't. Even if there are, they should be able to easily fix their variable, if they start getting an error message after switching to llvm-ranlib. I'm therefore thinking that there isn't a need to mirror this logic in llvm-ranlib, if it would make the code simpler (which it would).

since -X option will overwrite the env variable OBJECT_MODE,
in the source code , we should check the -X option of the command line first, if there is no -X option , we will check the env variable OBJECT_MODE. the logic of code is correct in the patch. we should not always getenv("OBJECT_MODE") before checking the -X option.

DiggerLin marked 2 inline comments as done.Jul 25 2023, 11:56 AM
DiggerLin added inline comments.
llvm/test/tools/llvm-ranlib/aix-X-option.test
18–19

the command env of AIX OS do not support -u. and there is no mapping option of env -u

https://www.ibm.com/docs/en/aix/7.2?topic=e-env-command

DiggerLin updated this revision to Diff 544066.EditedJul 25 2023, 12:01 PM
DiggerLin marked an inline comment as done.

addressed partial comments, after deciding whether to delete the bool NeedSymbols parameter from functions API,(writeArchiveToStream etc), I will upload a update patch.

jhenderson added inline comments.Jul 26 2023, 2:09 AM
llvm/test/tools/llvm-ranlib/aix-X-option.test
18–19

I'm 90% certain that env here is the built-in lit env, not a system env executable. env -u seems to be only rarely used in the test suite, but see llvm/utils/lit/tests/Inputs/shtest-env/env-u.txt for an example. I assume you can run this test locally?

69

Actually, you shouldn't have added "the" here. The grammatical rules are a little tricky to explain, so I won't bother, but you should actually say "Test invalid -X options and OBJECT_MODE environment variables."

llvm/tools/llvm-ar/llvm-ar.cpp
231

Well the BitMode is Big Archive specific, and so isn't actually relevant for any other archive format. I don't think it's clear to a maintainer who is unfamiliar with Big Archive that "BitMode = None" implies 2don't write a symbol table". Really, I think the whole llvm-ar.cpp code could do with some major refactoring to introduce more object orientation, so that the details of different formats can be hidden from each other, but that's outside the scope of this patch (in this specific case, I'd have a SymtabWriter class with NoSymtabWriter, BigArchiveSymtabWriter32, and so on subclasses - these would then have methods that did the appropriate writing of symbol tables).

I do see the appeal of a single enum parameter in the meantime. I would suggest it should be called SymtabWritingMode and have values NoSymtab, NormalSymtab, BigArchive64Bit, BigArchive32Bit, and BigArchiveBoth. You could optionally drop one of the BigArchive* values, and treat NormalSymtab as that value for BigArchive (as long as "Normal" is clear as to its meaning to someone familiar with BigArchive). All existing WriteSymtab false values would be replaced with NoSymtab and true with NormalSymtab (except for BigArchive cases).

1444

we do not need to add a additional any for llvm-ranlib

As noted earlier, adding an any value would align llvm-ranlib and llvm-ar's command-line options, allowing the code to be simpler. It doesn't break existing users by permitting it either. You have explained why you aren't accepting it, but not actually what the benefit is of that approach. What is the benefit of NOT accepting any in llvm-ranlib?

1468

since -X option will overwrite the env variable OBJECT_MODE, in the source code , we should check the -X option of the command line first, if there is no -X option , we will check the env variable OBJECT_MODE. the logic of code is correct in the patch. we should not always getenv("OBJECT_MODE") before checking the -X option.

I don't agree. It is normal to validate all of a a user's input options (including environment variables that are a proxy for command-line options) even if they don't end up getting used. Consider a similar, admittedly somewhat arbitrary case. If you had one command called e.g. --doTheThing whose sole purpose was to ensure that some if block was run, and then another option --doTheThingThisWay={methodA|methodB} which controlled what happens within that if block, it would be perfectly normal and acceptable to check the value passed to --doTheThingThisWay even without --doTheThing being specified. The code would typically look like this pseudo code:

Config processCommandLine(Args args) {
  Config cfg;
  if ("doTheThing" in args) {
    cfg.doTheThing = true;
  }
  if ("doTheThingThisWay" in args) {
    switch(args["doTheThingThisWay"].value) {
       case "methodA":
        cfg.doTheThingThisWay = DoTheThingThisWayKind::MethodA;
        break;
       case "methodB":
        cfg.doTheThingThisWay = DoTheThingThisWayKind::MethodB;
        break;
      default:
        reportError("invalid --doTheThingThisWay value");
    }
  }
}

The OBJECT_MODE environment variable is a proxy for the -X option. Therefore, it should be parsed and checked in the same manner as the -X option even if it isn't used. This brings consistency between the two tools, which is good. Incosistency is bad: it increases code complexity and potentially can confuse users. What is the benefit to making the two tools inconsistent?

stephenpeckham accepted this revision.Jul 26 2023, 9:22 AM

I don't see any reason to check the OBJECT_MODE environment variable if the -X flag is used. What would the error be: "You specified a valid -X flag, but by the way, OBJECT_MODE is set to an invalid value"?

I think all the commands that examine XCOFF files (llvm-ar, lllvm-ranlib, llvm-readobj, llvm-objdump, llvm-nm, etc.) should recognize "32", "64", "32_64", and "any". I don't think it's necessary to recognize "d64", even if AIX commands do. In addition, I wouldn't bother recognizing an XCOFF file with the magic number for a discontinued 64-bit object. That means that "32_64" and "any" have the same behavior. If -X is specified and does not have one of the 4 specified values, a usage message should be printed. If -X is not specified but OBJECT_MODE is in the environment, a message should be printed if the value is not one of the 4 specified values.

This revision is now accepted and ready to land.Jul 26 2023, 9:22 AM

I don't see any reason to check the OBJECT_MODE environment variable if the -X flag is used. What would the error be: "You specified a valid -X flag, but by the way, OBJECT_MODE is set to an invalid value"?

The error would be "invalid value for OBJECT_MODE environment variable" (or something to that effect), which would mean the user fixes their environment, just the same as if they hadn't used -X. I just want to emphasise though that my main concern is that llvm-ar and llvm-ranlib are being inconsistent - one of them checks the environment variable first, the other checks the command-line option first, and this seems wrong - they should do the same thing. By reading and checking the environment variable first, it simplifies the code logic (no need for HasAIXXOption for example).

As an alternative (but I think adds unnecessary complexity, due to needing an extra variable), you could have both tools read the environment variable into a string variable, then, if the -X option is present, overwrite that variable, and finally feed that string into the parsing code that converts into a BitMode value. If the string is invalid, the parsing code could report an error along the lines of "invalid OBJECT_MODE or -X option value".

I think all the commands that examine XCOFF files (llvm-ar, lllvm-ranlib, llvm-readobj, llvm-objdump, llvm-nm, etc.) should recognize "32", "64", "32_64", and "any". I don't think it's necessary to recognize "d64", even if AIX commands do. In addition, I wouldn't bother recognizing an XCOFF file with the magic number for a discontinued 64-bit object. That means that "32_64" and "any" have the same behavior. If -X is specified and does not have one of the 4 specified values, a usage message should be printed. If -X is not specified but OBJECT_MODE is in the environment, a message should be printed if the value is not one of the 4 specified values.

Yes, to be clear, I'm not advocating that "d64" support should be added (I'm not opposed to it, should there be a use-case for it either).

MaskRay added inline comments.Jul 27 2023, 3:58 AM
clang/test/lit.cfg.py
284

Recent Python style prefers double quotes

DiggerLin marked 3 inline comments as done.Jul 27 2023, 6:41 AM
DiggerLin added inline comments.
llvm/test/tools/llvm-ranlib/aix-X-option.test
18–19

it run fail in AIX OS locally.

llvm/tools/llvm-ar/llvm-ar.cpp
1468

The OBJECT_MODE environment variable is a proxy for the -X option. Therefore, it should be parsed and checked in the same manner as the -X option even if it isn't used. This brings consistency between the two tools, which is good. Incosistency is bad: it increases code complexity and potentially can confuse users. What is the benefit to making the two tools inconsistent?

as I mention before , I will create patches to let the llvm-ar and llvm-nm to work same as logic llvm-ranlib for the env variable "OBJECT_MODE"

DiggerLin marked 2 inline comments as done.Jul 27 2023, 7:41 AM

As an alternative (but I think adds unnecessary complexity, due to needing an extra variable), you could have both tools read the environment variable into a string variable, then, if the -X option is present, overwrite that variable, and finally feed that string into the parsing code that converts into a BitMode value. If the string is invalid, the parsing code could report an error along the lines of "invalid OBJECT_MODE or -X option value".

if I do not think it is better than my current implement, If I implement as your suggestion, I need another variable to recoded the where the string come from(from OBJECT_MODE  or -X option value). otherwise when the invalid value of string , how can I report it is an invalid OBJECT_MODE"  or "invalid -X option value"
DiggerLin added a comment.EditedJul 27 2023, 8:43 AM

I do see the appeal of a single enum parameter in the meantime. I would suggest it should be called SymtabWritingMode and have values NoSymtab, NormalSymtab, BigArchive64Bit, BigArchive32Bit, and BigArchiveBoth. You could optionally drop one of the BigArchive* values, and treat NormalSymtab as that value for BigArchive (as long as "Normal" is clear as to its meaning to someone familiar with BigArchive). All existing WriteSymtab false values would be replaced with NoSymtab and true with NormalSymtab (except for BigArchive cases).

with you suggestion, if I add wrapper and some function member for SymtabWritingMode , there is no different with my current implement with WriteSymTabType.

class WriteSymTabType {
public:
  enum SymtabWritingMode {
    NoSymtab,  // Not write Symbol table.
    Both,     // Write both 32-bit and 64-bit symbol table. for non_AIX, there is always write  both 32-bit and 64-bit symbol table.
    BigArchiveBit32, // Only write the 32-bit symbol table.
    BigArchive64BitBit64  // Only write the 64-bit symbol table.
  };

  WriteSymTabType(bool PrintSym) { Value = PrintSym ? Both : None; }
  void operator=(bool PrintSym) { Value = PrintSym ? Both : None; }
  operator bool() { return Value != None; }
  void setBitMode(SymtabWritingMode BM) { Value = BM; }
  SymtabWritingMode getBitMode() { return Value; }

private:
  SymtabWritingMode Value;
};
llvm/tools/llvm-ar/llvm-ar.cpp
231

Having looked at this again, I really don't like the implicit conversion semantics of the new class, and I don't see why you can't just extend the writeArchiveToBuffer etc parameters to pass in a BitMode parameter.

I am not sure why you do not like the implicit of the conversion semantics of the new class, can you elaborate the disadvantage in the case ?

DiggerLin updated this revision to Diff 544804.Jul 27 2023, 8:58 AM

addressed comment and let llvm-ranlib supports -Xany option

DiggerLin marked an inline comment as done.Jul 27 2023, 10:16 AM

As an alternative (but I think adds unnecessary complexity, due to needing an extra variable), you could have both tools read the environment variable into a string variable, then, if the -X option is present, overwrite that variable, and finally feed that string into the parsing code that converts into a BitMode value. If the string is invalid, the parsing code could report an error along the lines of "invalid OBJECT_MODE or -X option value".

if I do not think it is better than my current implement, If I implement as your suggestion, I need another variable to recoded the where the string come from(from OBJECT_MODE or -X option value). otherwise when the invalid value of string , how can I report it is an invalid OBJECT_MODE" or "invalid -X option value"

I'm not convinced you need to distinguish the two. If a user hasn't specified the -X option, then they'll know it's the OBJECT_MODE environment variable. Even if they have both, it should be obvious to a quick glance of the full error message what the wrong value is (because you'd include it in the message) and you could then it won't take long to find which of the two is wrong (even if they don't know that -X trumps OBJET_MODE). That being said, I still prefer the parse OBJECT_MODE first, report an error, then later read the -X option and check for an error there.

clang/test/lit.cfg.py
284

This also applies to the line above. See discussions on Discourse about using "black" to format python code (it's similar to clang-format, but for python).

llvm/test/tools/llvm-ranlib/aix-X-option.test
18–19

My apologies, it appears that you must be using the external shell or something (see lit differences internal and external shell). That might explain why env -u isn't used that much.

llvm/tools/llvm-ar/llvm-ar.cpp
231

Implicit conversions from primitive types to more complex types are generally, in my experience, a maintenance nightmare. Here are some specific disadvantages to this case:

  • From the caller's perspective, it isn't obvious that a complex type of WriteSymtabType, as declared in the function's signature that they are probably looking at, can be constructed from a boolean.
  • It's not even particularly obvious that implicit conversion is possible by a casual glance at the class definition (since implicit construction is defined by the lack of explicit and it's not always obvious if you are only briefly looking that explicit is missing).
  • WriteSymtabType is essentially an enum. Without looking up the details of how the class works, it's not obvious to the caller what a true or false value maps to. You wouldn't typically construct an enum from integer literal values for this sort of reason - you'd spell out the specific enum value that you intend.
  • You are trying to add support for a new file format to existing code. This is tainting how you do things. If you wrote this code from scratch, you wouldn't use the implicit conversion stuff - you'd use an enum or even a proper class hierarchy. It seems to me like one of the main reasons you're using the complex type is because you don't want to update the various call sites, which, put crudely, is a bit lazy. You should always aim to leave things in at least a no-worse state than when you found it, preferably better.

I could probably come up with other issues if I thought about it more, but really I keep coming back to "if I completely rewrote this code (without object orientation), what would I do?" and the enum wins every time.

DiggerLin updated this revision to Diff 545392.Jul 29 2023, 6:01 PM

As an alternative (but I think adds unnecessary complexity, due to needing an extra variable), you could have both tools read the environment variable into a string variable, then, if the -X option is present, overwrite that variable, and finally feed that string into the parsing code that converts into a BitMode value. If the string is invalid, the parsing code could report an error along the lines of "invalid OBJECT_MODE or -X option value".

if I do not think it is better than my current implement, If I implement as your suggestion, I need another variable to recoded the where the string come from(from OBJECT_MODE or -X option value). otherwise when the invalid value of string , how can I report it is an invalid OBJECT_MODE" or "invalid -X option value"

I'm not convinced you need to distinguish the two. If a user hasn't specified the -X option, then they'll know it's the OBJECT_MODE environment variable. Even if they have both, it should be obvious to a quick glance of the full error message what the wrong value is (because you'd include it in the message) and you could then it won't take long to find which of the two is wrong (even if they don't know that -X trumps OBJET_MODE). That being said, I still prefer the parse OBJECT_MODE first, report an error, then later read the -X option and check for an error there.

It looks like you've ignored this point in your most recent changes. I agree that having a variable to say whether the BitMode variable comes from OBJECT_MODE or -X is not great, and I'd prefer the HasAIXOption style you had before over it. However, I'd still rather the invalid OBJECT_MODE value be read and rejected upfront before even parsing the -X option.

llvm/include/llvm/Object/ArchiveWriter.h
44
45

This comment doesn't make sense for non-Big Archive archives, because regular archives only have one symbol table. There is no concept of a 32- or 64-bit one. Perhaps "Write the symbol table. For the Big Archive format, writes both 32-bit and 64-bit symbol tables."

46–47

I'd prefix these two with BigArchive, or even just name them BigArchive32 and BigArchive64 respectively, to more clearly convey the fact that they are specific to that file format.

llvm/lib/Object/ArchiveWriter.cpp
966

Where this comparison is repeated more than once in the same function, it might be nice to store the value in a local boolean variable for use in all places instead of repeating the comparison.

llvm/test/tools/llvm-ranlib/non-AIX-not-supportedwq-X-option.test
1 ↗(On Diff #545392)

Looks like there's a typo in your test name?

6 ↗(On Diff #545392)

Nit: trailing whitespace

DiggerLin added a comment.EditedAug 1 2023, 5:54 AM

I'd still rather the invalid OBJECT_MODE value be read and rejected upfront before even parsing the -X option.

as Stephen and me point out, if there is -X option, we do not care about the environment env OBJECT-MODE , when user input the -X option , it means , they want to use -X option(they do not want the value from OBJECT-MODE ) ,I do not think llvm-ranlib need to parse the OBJECT-MODE for them. when user input a valid -X option value (but there is invalid OBJECT-MODE value), why llvm-ranlib need to report a invalid OBJECT-MODE value for them. AIX OS ranlib do not do that.

DiggerLin updated this revision to Diff 546071.Aug 1 2023, 8:05 AM
DiggerLin marked 5 inline comments as done.

address comment

DiggerLin marked an inline comment as done.Aug 1 2023, 8:42 AM
DiggerLin added inline comments.
llvm/test/tools/llvm-ranlib/non-AIX-not-supportedwq-X-option.test
1 ↗(On Diff #545392)

yes , it should be non-AIX-not-supported-X-option.test, thanks

I glanced at the patch. The code seems reasonable.

clang/test/lit.cfg.py
276
llvm/lib/ToolDrivers/llvm-lib/LibDriver.cpp
488 ↗(On Diff #546071)
llvm/test/tools/llvm-ranlib/non-AIX-not-supported-X-option.test
1 ↗(On Diff #546071)

UNSUPPORTED: system-aix

I think the convention is to name aix-X-option.test and non-AIX-not-supported-X-option.test with a shared prefix to make identify such tests easier.

Perhaps the file can be renamed to AIX-X-option-non-AIX.test or similar.

4 ↗(On Diff #546071)

With one prefix, we prefer --check-prefix=

DiggerLin updated this revision to Diff 546444.Aug 2 2023, 6:48 AM
DiggerLin marked an inline comment as done.
DiggerLin marked 3 inline comments as done.
MaskRay added inline comments.Aug 2 2023, 8:40 AM
llvm/tools/llvm-ar/llvm-ar.cpp
82

"Specify". The convention is to use imperative sentences. Also, does the - align with - above?

(The capitalization is a bit inconsistent. For existing llvm-ranlib options, capitalization is more common.)

DiggerLin updated this revision to Diff 546867.Aug 3 2023, 7:44 AM
DiggerLin marked 2 inline comments as done.

clang-format is complaining in the pre-merge CI.

clang/test/lit.cfg.py
283–284

It might be a good idea for you to run the black tool on python changes, to ensure they conform to the coding standard, much like you do with clang-format for C++ changes.

llvm/lib/Object/ArchiveWriter.cpp
901

This makes the variable name more grammatically correct.

llvm/lib/ToolDrivers/llvm-lib/LibDriver.cpp
488 ↗(On Diff #546071)

Marked as done, but not fully addressed - in this specific style, there should be no space between the comment and thing it's referring to. This is a reason why clang-format will be failing (there may be others).

llvm/tools/llvm-ar/llvm-ar.cpp
74

Strictly speaking, this should be "Big Archive formats only" not "AIX OS only" since theoretically you could have Big Archive archives on other platforms. However, I'm not bothered if you don't want to change this. The same probably goes for other tools for that matter, but don't change them now.

DiggerLin updated this revision to Diff 549938.Aug 14 2023, 7:57 AM
DiggerLin marked 4 inline comments as done.
DiggerLin added inline comments.Aug 14 2023, 8:01 AM
clang/test/lit.cfg.py
283–284

thanks for let me know.

llvm/tools/llvm-ar/llvm-ar.cpp
74

Strictly speaking, this should be "Big Archive format on AIX OS only" ,
in AIX OS , you can still input the -X option , but the X option not work for gnu archive format.

jhenderson accepted this revision.Aug 15 2023, 12:24 AM

I'm going to accept this change, although I still have significant concerns about how the whole parsing logic seems more complicated than it needs to be.

llvm/tools/llvm-ar/llvm-ar.cpp
74

Ah, sorry, I misremembered the code, you are right.

It does raise a question whether the -X option at least should be available on non-AIX platforms, because otherwise there's no way of controlling the symbol table behaviour like there is on AIX. However, that's probably a different patch (series) and not necessarily one we need to worry about unless somebody has an actual use-case.

# REQUIRES: system-aix in llvm/test/tools/llvm-ranlib/aix-X-option.test unfortunately makes most -X tests only work on AIX, which most contributors don't have access to.

Ideally -X is usable when the user specifies --format=bigarchive.

The environment variable OBJECT_MODE can be made AIX specific, though.

llvm/test/tools/llvm-ranlib/aix-X-option.test
2

# REQUIRES: system-aix

MaskRay added inline comments.Aug 15 2023, 11:52 PM
llvm/include/llvm/Object/ArchiveWriter.h
43

Below you use SymtabWritingMode:: for all members, so just make this enum scoped.

llvm/tools/llvm-ar/llvm-ar.cpp
1449

Remove Twine( and ).

1472

C++17 allows if (char *EnvObjectMode = getenv("OBJECT_MODE"))

DiggerLin marked 3 inline comments as done.

@MaskRay, would you have any additional comments to share? If not, would it be appropriate for me to proceed with committing the patch? Your input would be greatly appreciated.

MaskRay accepted this revision.Aug 21 2023, 9:20 AM