Page MenuHomePhabricator

[AIX]Emit function descriptor csect in assembly
ClosedPublic

Authored by Xiangling_L on Aug 25 2019, 6:53 PM.

Details

Summary

This patch emits the function descriptor csect for functions with definitions under both 32-bit/64-bit mode on AIX.

During PPCAIXAsmPrinter::EmitFunctionEntryLabel we emit the csect for each function descriptor.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
jasonliu added inline comments.Sep 6 2019, 9:11 AM
llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
1734 ↗(On Diff #218576)

Minor nit for the naming here:
We seems to use TOC anchor and TOC base to indicate the same thing interchangeably.
It might be better to use one term through out. In the assembler guide for AIX, we use TOC anchor only.
However, in llvm source, we use TOC base instead.
So maybe it's better to just be consistent with the existing source, and use the term TOC base?
Some suggestions on the naming:
Rename TOC to TOCStr
Rename TOCAnchor to TOCBaseSec

1740 ↗(On Diff #218576)

Minor nit: emit TOC base.

sfertile added inline comments.Sep 7 2019, 8:18 AM
llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1657 ↗(On Diff #218576)

The code for creating the entry point and descriptor symbols and their csects looks good, I don't think it needs any changes for functionality/correctness. I have a suggestion related to style though. We are inserting about 20 lines of AIX specific implementation details into a constructor thats only about 15 lines long before the change. Most developers that read this won't care about the AIX specific details, should we split the AIX entry point and descriptor setup into some helper functions? Something like

if ((MAI->needsFunctionDescriptors()) {
   assert(TM.getTargetTriple().isOSAIX() && "Function descriptor is only"
                                             " supported on AIX.");
  CurrentFnDescSym = getAIXDescriptorSymbol(MF.getFunction(), OutStreamer->getContext());
  CurrentFnSym = getAIXEntryPointSymbol(MF.getFunction(),  cast<MCSectionXCOFF>(getObjFileLowering(), TM);
} else {

keeps this function very readable by having descriptive helper functions bury the implementation details.

llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
1731 ↗(On Diff #218576)

There may be symbol references we could emit in assembly that require a TOC-base. Its not really an issue now but it may be once we implement the integrated assembler for AIX. I'm not sure if its better to be sloppy and simply always emit a TOC-base (an unused one would be benign ...)

1733 ↗(On Diff #218576)
if (M.empty())
  return;

Prefer an early return to nesting all the code inside an if statement.

llvm/test/CodeGen/PowerPC/test_func_desc.ll
15 ↗(On Diff #218576)

This IR is overly verbose, could we replace it with

%1 = call i32 @foo()
%2 = call i32 @foo_extern()
%3 = call i32 @foo_static()
%4 = add nsw i32 %1, %2
%5 = add nsw i32 %4, %3
ret i32 %5

(If I'm missing something from the test, you can generate less verbose IR by feeding this through opt -S -mem2reg)

59 ↗(On Diff #218576)

should this be main:? I believe we are checking for label for the symbol main.

74 ↗(On Diff #218576)

ditto. (static_foo:)

sfertile added inline comments.Sep 7 2019, 8:34 AM
llvm/lib/MC/MCExpr.cpp
445 ↗(On Diff #218576)

I'm not aware of any other square brackets used in symbol references on AIX, but I'm still pretty new to AIX assembly so that doesn't mean there isn't any.

We *could* add a bitfield to the MCSymbolRefExpr (similar to the UseParensForSymbolVariant field, but then we likely just have the same 'if' in the SymbolRef constructor to set the new bitfield.

jasonliu added inline comments.Sep 9 2019, 7:45 AM
llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
1731 ↗(On Diff #218576)

I have a slight preference on emitting TOC-base on a needed base, and deal with the issue that integrated assembler introduce when we see one. In this way, we know exactly when we need to emit TOC-base and when not.
And yes, an unused TOC-base would just get garbage collected by linker; GCC on AIX would also always emit a TOC-base. Hence, it's a slight preference on my side.

DiggerLin added inline comments.
llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
673 ↗(On Diff #218576)

if we can use if (TM.getTargetTriple().isOSAIX() && here, we do not need needsFunctionDescriptors() function and variable needsFunctionDescriptors?

1657 ↗(On Diff #218576)
  1. why using "if (MAI->needsFunctionDescriptors()) " here . can we using if(TM.getTargetTriple().isOSAIX) { here. I think for all aix assembly, they all need function description . and then we can delete the needsFunctionDescriptors from the patch ? and we can also delete assert(TM.getTargetTriple().isOSAIX() && "Function descriptor is only supported on AIX.");
  2. I have another suggestion is that we make void PPCAIXAsmPrinter::SetupMachineFunction(MachineFunction &MF) as virtual function(and keep the content), and add new function AsmPrinter::SetupMachineFunction(MachineFunction &MF), put the new content in the PPCAIXAsmPrinter::SetupMachineFunction(MachineFunction &MF).
llvm/lib/MC/MCSectionXCOFF.cpp
20 ↗(On Diff #218576)

since we will have a lot of StorageMappingClass value, can we use a macro like
#define ECASE(X) case XCOFF::XMC_##X: return #X
switch (SMC) {

ECASE(DS);
ECASE(RW);

}

or we can use array StorageMappingClassArray = {"PR", "RO","DB".......,"TE"};
return StorageMappingClassArray [SMC];

it maybe more simple or readable.

llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
1724 ↗(On Diff #218576)

from the begin of the function to here, it actually emit a function description, can we add a new function and put the content into the new function and invoke the new function here ?

llvm/include/llvm/MC/MCExpr.h
238 ↗(On Diff #218576)

The comment is meant to express what the variant looks like when printed (thus, @ does not match). In any case, I have doubts that this is the correct way to represent [TC0]. We can look at TOC[TC0] as referring less to the symbol, but more referring to the csect. In any case, the TC0 is the storage mapping class, which we have been using uppercase for.

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1657 ↗(On Diff #218576)

@DiggerLin wrote:

why using "if (MAI->needsFunctionDescriptors()) " here . can we using if(TM.getTargetTriple().isOSAIX) { here. I think for all aix assembly, they all need function description . and then we can delete the needsFunctionDescriptors from the patch ?

From the description of the MCAsmInfo class:

/// This class is intended to be used as a base class for asm
/// properties and features specific to the target.

Adding a hook to MAI for descriptors and querying that is a valid way to represent this. That said, there are other places in this file that change the codegen based on explicit check of either the OS or the binary format. Personally I slightly prefer the way the patch is doing this already mainly because needsFunctionDescriptors() makes the code self-documenting as opposed to getTargetTriple().isOSAIX().

@DiggerLin wrote:

  1. I have another suggestion is that we make void PPCAIXAsmPrinter::SetupMachineFunction(MachineFunction &MF) as virtual function ...

This suggestion of making SetupMachineFunction virtual is better then my suggestion of a local helper function.

sfertile added inline comments.Sep 11 2019, 7:52 AM
llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
1724 ↗(On Diff #218576)

A good suggestion. I think we should take it a bit farther though. Add an AsmPrinter::EmitDescriptor() virtual function. Calling the base class implementation should be a fatal_error. Then somewhere we can explicitly check the MCAsmInfo if descriptors are needed, and call this new virtual if they are. My suggestion would be in runOnMachineFunction.

hubert.reinterpretcast requested changes to this revision.Sep 11 2019, 11:29 AM
> cat toccommon.c
int TOC;
void f(void) { }

> ./bin/clang toccommon.c -target powerpc64-ibm-aix -m32 -fomit-frame-pointer -S
clang-10: /redacted/llvm/include/llvm/MC/MCSymbolXCOFF.h:41: void llvm::MCSymbolXCOFF::setContainingCsect(llvm::MCSectionXCOFF*): Assertion `(!ContainingCsect || ContainingCsect == C) && "Trying to set a containing csect that doesn't match the one that" "this symbol is already mapped to."' failed.
This revision now requires changes to proceed.Sep 11 2019, 11:29 AM
Xiangling_L marked 20 inline comments as done.Sep 11 2019, 11:59 AM
Xiangling_L added inline comments.
llvm/include/llvm/MC/MCExpr.h
238 ↗(On Diff #218576)

Thanks, I will update the comment. But I verified on AIX with Clangtana and GCC, it's TOC[tc0] as TOC base address in function descriptor.

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
673 ↗(On Diff #218576)

Yes, querying target will lead to correct result as well, but using MAI->needsFunctionDescriptors() is the way that we embed target-specific feature to target-independent file AsmPrinter.cpp and I think it is more self-explanatory.

llvm/lib/MC/MCSectionXCOFF.cpp
20 ↗(On Diff #218576)

I like your idea to make this snippet of code simpler and more readable. But it seems that StorageMappingClassArray cannot handle report_fatal_error situation. And personally, I feel it's okay to keep it this way by wasting a little bit more space rather than spending time searching and reading macro.

llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
1724 ↗(On Diff #218576)

Some description of function EmitFunctionEntryLabel():

This is a virtual function to allow targets to do their wild and crazy things as required.

So my understanding is on AIX, we use this function to emit function descriptor as the comment specifies.

1731 ↗(On Diff #218576)

Currently, I prefer leaving it to emit .toc when needed as Jason said we may want to know exactly when we need to emit TOC anchor. But I will leave this discussion open and see if there are more input about this.

Xiangling_L marked 2 inline comments as done.

Address 2nd round comments

Chang TOC base section name to ".toc" and fix testcase assertion error

Minor changes

llvm/include/llvm/MC/MCExpr.h
238 ↗(On Diff #218576)

XL generates TOC{TC0}, not TOC[tc0].

hubert.reinterpretcast requested changes to this revision.Sep 11 2019, 4:18 PM
hubert.reinterpretcast added inline comments.
llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
1762 ↗(On Diff #219783)

Why are we naming a symbol .toc here and not really doing anything with it here or elsewhere in the patch?

1764 ↗(On Diff #219783)

The csect name is "TOC" in the symbol table, but we're naming it .toc. here.

This revision now requires changes to proceed.Sep 11 2019, 4:18 PM
llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
1678 ↗(On Diff #219783)

Please add a comment to explain why this does not call the base class version of SetupMachineFunction and instead has some copy/paste. Alternatively, call the base class version if possible.

llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
1739 ↗(On Diff #219783)

This comment is in the wrong place. The comment for this line should be "[e]mit entry point address".

1742 ↗(On Diff #219783)

I do not believe that the TOC base is a variant of unrelated symbols coincidentally named "TOC".

llvm/test/CodeGen/PowerPC/test_func_desc.ll
2 ↗(On Diff #219783)

--check-prefixes=CHECK,32BIT

5 ↗(On Diff #219783)

Ditto.

60 ↗(On Diff #219783)

This is technically not needed for this case, and XL does not generate this unnecessary function descriptor.

sfertile added inline comments.Sep 12 2019, 8:34 AM
llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
1762 ↗(On Diff #219783)

Do we need a TOC symbol to represent the TOC base? Is the TOC base a csect with no contained symbol, or is it similar to the common csects where there is an implicit symbol with the same name contained in the csect? It seems like we really only need the symbol for our internal MC representation, in which case the name doesn't matter outside of being unique and not clashing with anything in the users namespace.

1724 ↗(On Diff #218576)

I think we are pushing past the limits of what the comment intended. I take it to mean 'the wild and crazy function entry point things`. The way the ELF V1 abi handles descriptors (with the entry point being a local symbol, and the descriptor being the global symbol) it kindof squeaks into this definition. For AIX we are emitting 2 distinct symbols, one of which has no relation to the function entry point symbol other then referencing its address.

sfertile added inline comments.Sep 12 2019, 10:14 AM
llvm/include/llvm/MC/MCExpr.h
238 ↗(On Diff #218576)

According to the assembly language ref:

A qualname operand takes the form of:
symbol[XX]
OR
symbol{XX}

So both bracketing formats are acceptable, I believe we have been using '[XX]' so far in llvm.

The assembly language ref also uses both uppercase and lowercase spellings for the storage mapping class, but I believe we have used uppercase exclusively so far in llvm.

llvm/include/llvm/MC/MCExpr.h
238 ↗(On Diff #218576)

Yes, and the group of comments was regarding the capitalization. The comment indicating the bracketing used was a counterexample to the response made to my comment regarding the capitalization.

llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
1742 ↗(On Diff #219783)

Jason and I discussed this off-list, and we believe this to be too much of a hack. In the long term, we probably need some way to separate things that are defined as a label (or QualName without a storage mapping class) from things that are defined by a QualName that has a storage mapping class (like TOC entries). The way a reference should be written in the assembly depends on the form of the definition. One of the decisions to be made is whether they are all MCSymbols or only some of them are MCSymbols. If they are all MCSymbols then we need to differentiate them on more than just the plain name. As far as the system assembler behaviour is concerned, even the name and storage mapping class is not sufficiently unique (it is possible to generate an external reference to a symbol whose name and storage mapping class is the same as another symbol in the assembler source file). A compromise that might be okay for now is to write the csect reference using a symbol named "TOC[TC0]".

1762 ↗(On Diff #219783)

Since this needs a QualName with the storage mapping class for referencing in the assembly, it is more the former (a csect with no contained symbol).

Note that we could choose to use a QualName with the storage mapping class for the .comm-produced csects too, but references to the csect need to match the form used on the .comm pseudo-op. Since the form used on the .comm pseudo-op affects how references look, I think not using the QualName is the right behaviour in terms of being the least surprising.

As for the question of whether we need a symbol to represent the TOC base, I think the question is better asked with regards to how we want to represent references to the TOC base in the MC representation, the mechanism for generating the symbol table entry in the binary emission, and how we will make those references refer to the correct symbol table entry in the binary emission.

DiggerLin added inline comments.Sep 13 2019, 6:26 AM
llvm/include/llvm/MC/MCSymbolXCOFF.h
38 ↗(On Diff #219783)

why delete const here ?

jasonliu added inline comments.Sep 13 2019, 3:03 PM
llvm/include/llvm/MC/MCSymbolXCOFF.h
38 ↗(On Diff #219783)

Please refer to the const_cast comment I made earlier.

Xiangling_L marked 22 inline comments as done.Sep 16 2019, 8:51 AM
Xiangling_L added inline comments.
llvm/test/CodeGen/PowerPC/test_func_desc.ll
60 ↗(On Diff #219783)

Yes, this function descriptor is unnecessary. But I vaguely recall that we decided the optimization of function descriptor generation would not be taken into consideration temporarily. So I am thinking is that okay to leave it as it is, and remove this part when we have optimization implemented?

hubert.reinterpretcast marked an inline comment as done.Sep 16 2019, 5:19 PM
hubert.reinterpretcast added inline comments.
llvm/test/CodeGen/PowerPC/test_func_desc.ll
60 ↗(On Diff #219783)

Sure.

Xiangling_L marked an inline comment as done.

Address comments

Xiangling_L marked 2 inline comments as done.

add EmitFunctionDescriptor()

Delete "EmitFunctionEntryLabel()" target-specific one for AIX

jasonliu added inline comments.Sep 19 2019, 1:07 PM
llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
770 ↗(On Diff #220872)

This could be done in the header instead. People could know it's a platform specific thing when reading the header, no need to jump to the source file.
Also I think we could use llvm_unreachable in this situation instead to get minor optimization advantage, but no big deal.

llvm/lib/MC/MCAsmInfoXCOFF.cpp
28 ↗(On Diff #220872)

We used isValidUnquotedName when printing symbol name in this context:
...

StringRef Name = getName();
if (!MAI || MAI->isValidUnquotedName(Name)) {
  OS << Name;
  return;
}

if (MAI && !MAI->supportsNameQuoting())
  report_fatal_error("Symbol name with unsupported characters");

...

If we always return true for isValidUnquotedName then we won't be able to detect symbol name with unsupported characters.
On that note, we should probably set SupportsQuotedNames to be false for AIX target.
Using TOC[TC0] as symbol name is kinda a hack since XCOFF symbol could not contain '[' AFAIK.
So maybe we would override isValidUnquotedName and make it very similar to the original one, except '[' is a valid Name temporarily. Add a FIXME to say remove this whole function when we don't use TOC[TC0] as symbol any more.

llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
1739 ↗(On Diff #220872)

Minor nit: we don't need the "return".

llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
1739 ↗(On Diff #220872)

I do not believe there is definitive coding standard guidance on this, but when deferring the rest of the control flow and the return value in a function to another function, using return is my preference even in the case of returning void (if both the caller and callee return void).

Xiangling_L marked 4 inline comments as done.Sep 20 2019, 12:54 PM

Edit isValidUnquotedName to return true only for TOC symbol '"TOC[TC0]" on AIX

jasonliu accepted this revision.Sep 23 2019, 7:58 AM

LGTM with some minor change.

Should SupportsQuotedNames return false on AIX?

llvm/lib/MC/MCAsmInfoXCOFF.cpp
29 ↗(On Diff #221101)

nit: remove space in " TOC[TC0]".

Xiangling_L marked an inline comment as done.

minor changes

llvm/lib/MC/MCAsmInfoXCOFF.cpp
30 ↗(On Diff #221386)

Minor nit: Add "a" before "symbol name". Capitalize "remove".
Change "don't use [ ... ] anymore" to "stop using [ ... ]".

llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
1680 ↗(On Diff #221386)

I think the base class SetupMachineFunction might be usable if it called two functions in order to setup a function descriptor symbol as necessary, and then retrieve the entry point symbol. Although the copy/paste is currently one line, it might grow in the future.

1768 ↗(On Diff #221386)

Please name the section "TOC" and not "TOC[TC0]". The section name is not used by the assembly generation, but we might find a use for it for the object generation.

Xiangling_L marked 2 inline comments as done.

Refactor the "SetupMachineFunction" to make it usable as a base class version.

Xiangling_L marked an inline comment as done.Sep 25 2019, 10:09 AM
llvm/include/llvm/CodeGen/AsmPrinter.h
115 ↗(On Diff #221800)

runOnMachineFunction reference seems out-of-date with the current patch.

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1665 ↗(On Diff #221800)

Assert also that CurrentFnDescSym is set.

llvm/lib/MC/MCAsmInfoXCOFF.cpp
30 ↗(On Diff #221800)

s/remove/Remove/;
s/any more//;

Xiangling_L marked 3 inline comments as done.

Correct some comments;
Add an assertion for CurrentFnDescSym.

This revision is now accepted and ready to land.Sep 26 2019, 8:47 AM

LGTM. Thanks.

Thank you for your reviews.

Closed by commit rL373009: [AIX]Emit function descriptor csect in assembly (authored by xiangling_liao, committed by ). · Explain WhySep 26 2019, 12:37 PM
This revision was automatically updated to reflect the committed changes.