Page MenuHomePhabricator

[XCOFF][AIX] Give symbol an internal name when desired symbol name contains invalid character(s)
ClosedPublic

Authored by jasonliu on Jun 24 2020, 10:19 AM.

Details

Summary

When a desired symbol name contains invalid character that the system assembler could not process, we need to emit .rename directive in assembly path in order for that desired symbol name to appear in the symbol table.

Diff Detail

Event Timeline

jasonliu created this revision.Jun 24 2020, 10:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 24 2020, 10:19 AM
jasonliu updated this revision to Diff 273426.Jun 25 2020, 10:11 AM

Address clang-tidy comments.

llvm/test/CodeGen/PowerPC/aix-xcoff-symbol-rename.ll
79

Minor nit: Trailing whitespace at the end of line (multiple instances of OBJ-EMPTY exhibit this).

daltenty added inline comments.Jun 25 2020, 2:37 PM
llvm/lib/MC/MCAsmStreamer.cpp
820

Previously we avoided emitting this directive for some case where it wasn't strictly required (i.e. function descriptors). Why the change in behaviour? (at first glance seems somewhat orthogonal to this patch)

jasonliu marked an inline comment as done.Jun 25 2020, 4:15 PM
jasonliu added inline comments.
llvm/lib/MC/MCAsmStreamer.cpp
820

If we don't emit this directive for function descriptor, then we would not trigger the rename for that .lglobl function descriptor when needed. While not having rename for the HIDE_EXT function descriptor is not a huge problem, we would have inconsistency in the object file generation path (object file generation will use the original name when available.).

jasonliu marked an inline comment as done.Jun 25 2020, 4:31 PM
jasonliu added inline comments.
llvm/lib/MC/MCAsmStreamer.cpp
820

And I think previously we choose to skip emitting the .lglobl for function descriptor because we have it as a label, and we would get a unnecessary symbol in the symbol table because of that. But now we have function descriptor as csect, so emitting .lglobl for it is redundant but do not have real effect. So I think it might be just easier to emit it and keep everything consistent.

jasonliu updated this revision to Diff 273713.Jun 26 2020, 6:52 AM

Remove trailling whitespace.

jasonliu marked an inline comment as done.Jun 26 2020, 6:52 AM
DiggerLin added inline comments.Jun 26 2020, 7:53 AM
llvm/include/llvm/MC/MCSymbolXCOFF.h
64

even if there is some Overhead, using hasRename() can let user know what the getSymbolTableName() do ?

llvm/lib/MC/MCAsmStreamer.cpp
873

I do not think print out rename directive here is a good idead, as the function name :emitCommonSymbol suggestion , it just print the .comm directive, and it is common function for all the target(not only for XCOFF).

I think it maybe better to put the code in the function emitGlobalVariable() before we call the OutStreamer->emitCommonSymbol(GVSym, Size, Align);

llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
1613

I think it maybe better to call emitSymbolAttribute(GVSym, MCSA_LGlobal) and then return directly,

Reason:

  1. for the .lglobl do not have have visibility, it do not need to go to emitXCOFFSymbolLinkageWithVisibility() ,
  2. we can move the code if (cast<MCSymbolXCOFF>(Symbol)->hasRename()) OutStreamer->emitXCOFFRenameDirective(Symbol, cast<MCSymbolXCOFF>(Symbol)->getSymbolTableName());

into this function.

  1. if you call emitSymbolAttribute directly, we do not need to modify the function emitXCOFFSymbolLinkageWithVisibility() in this patch.
jasonliu marked 2 inline comments as done.Jun 26 2020, 8:35 AM
jasonliu added inline comments.
llvm/lib/MC/MCAsmStreamer.cpp
873

As I mentioned in the other comment, emitGlobalVariable() gets called on both assembly path and object file generation path as well. So we could not directly call emitXCOFFRenameDirective() there because we only wants to do it on the assembly generation path.
I understand it is common function for all the target, but it doesn't mean we could not do target specific staff in this function (See all the MAI calls).
I would argue for XCOFF specifically, emitCommonSymbol would need to tight together with this .rename everywhere (meaning if you need to emit a .comm directive, you need to check if the symbol used have a rename, if so emit .rename), so it kinda of make sense to do it here, as oppose to making sure every future use of emitCommonSymbol call on the assembly path accompany by a emitRename call below.

llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
1613

We only want to emitXCOFFRenameDirective when it's emitting assembly though.
And emitLinkage is called on both assembly generating path and object generation path, so we can't move that call directly to here.
It's still possible to make it happen by wrapping emitXCOFFRenameDirective with another new function(something like handleXCOFFRename?), and have that function do emitXCOFFRenameDirective on the assembly generation side, but do nothing on the object file generation side.
But I don't find adding that new function to be particular appealing though (we need to add so many extra boilerplate to tell other target that this function is not meant for them, we already did it for emitXCOFFRenameDirective but that's necessary evil because that directive is indeed not supported by other target).

jasonliu updated this revision to Diff 273752.Jun 26 2020, 9:03 AM

Use rename() when possible.

jasonliu marked an inline comment as done.Jun 26 2020, 9:03 AM
Xiangling_L added inline comments.Jun 26 2020, 9:58 AM
llvm/lib/MC/MCContext.cpp
311

s/have use/have or s/have use/use?

326

I think some comments here explaining the way we handle the name conflict may lead to entry point and function descriptor symbols rename mismatch may help.

jasonliu updated this revision to Diff 273774.Jun 26 2020, 10:22 AM

Adjust comments.

jasonliu marked 2 inline comments as done.Jun 26 2020, 10:22 AM
DiggerLin added inline comments.Jun 26 2020, 10:47 AM
llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
1613

If we change the function

void MCStreamer::emitXCOFFRenameDirective(const MCSymbol *Name,
                                          StringRef Rename) {
  llvm_unreachable("emitXCOFFRenameDirecitve is only supported on "
                   "XCOFF targets");
}

to

void MCStreamer::emitXCOFFRenameDirective(const MCSymbol *Name,
                                          StringRef Rename) {

}

and object generation path will do nothing for the emitXCOFFRenameDirective(), it just some overhead for object generation path but the logic will be more clear.

jasonliu marked an inline comment as done.Jun 26 2020, 11:07 AM
jasonliu added inline comments.
llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
1613

Object generation path will take the one in MCXCOFFStreamer.h, which currently doing report_fatal_error.
I'm doing report_fatal_error because we do not need to invoke that function for now. That function will need to set the symbol table name for a rename when we implement the AsmParser for XCOFF and it could not be an empty function that does nothing for that purpose.

llvm/lib/MC/MCContext.cpp
313

Although this encoding generates runs of underscores, it does make it less intrusive in terms of being able to use c++filt on the result. Note that collisions increase with non-Latin scripts (which could be reduced if we append the hex values of all the underscores, including the original ones, to the end of the string).

namespace ImplVerTags {

typedef struct
一世
{} A;
typedef struct
二世
{} B;

}

template <typename> void *implStaticDataUnsafe = nullptr;

template void *implStaticDataUnsafe<ImplVerTags::A>;
template void *implStaticDataUnsafe<ImplVerTags::B>;
jasonliu marked an inline comment as done.Jun 26 2020, 12:12 PM
jasonliu added inline comments.
llvm/lib/MC/MCContext.cpp
313

Just want to make sure I understand your suggestion:
so if I have a "f`!o" as symbol name, I would get back _Renamed..6021f__o back?
Do we want to consider just adding those hex values up? Like getting back _Renamed..81f__o for that?

jasonliu marked an inline comment as done.Jun 26 2020, 12:23 PM
jasonliu added inline comments.
llvm/lib/MC/MCContext.cpp
313

If we choose to get the results like _Renamed..6021f__o, in what way I could trigger a collision? Do we need to have a collision handling mechanism then?

llvm/lib/MC/MCContext.cpp
313

Hmm, I suggested appending but adding to the front does seem to make it easier to eyeball the split (in addition to making the more relevant part show up first for C identifiers in non-Latin scripts). I like it. Also, yes, I meant that f_$ would become (with the prefix) _Renamed..5f24f__.

We shouldn't end up with collisions in that case. The transformation is completely reversible by identifying the number of underscores to determine the prefix length and then replacing each underscore with the encoded value.

346

Moot point if we're going with the encoding change, but the collision handling mechanism needs to check that the output name is not also going to collide:

void f_$o() { }
void f$_o() { }
void f_$o1() { }
llvm/test/CodeGen/PowerPC/aix-xcoff-symbol-rename.ll
3

Can't help but notice that none of the renamed entities are referenced via the TOC.

For example:

int f$o();
int (*bar())() { return f$o; }
jasonliu marked 2 inline comments as done.Jun 29 2020, 9:55 AM
jasonliu added inline comments.
llvm/lib/MC/MCContext.cpp
346

Thanks. Now I like the not-collidable renaming mechanism much more when I know I avoided a while loop to check if the ouput name is going to collide.

llvm/test/CodeGen/PowerPC/aix-xcoff-symbol-rename.ll
3

Thanks for pointing it out. Indeed I missed handling TOC entry renaming in this patch.

jasonliu updated this revision to Diff 274163.Jun 29 2020, 10:04 AM

Change encoding mechanism for symbol renaming.

jasonliu marked an inline comment as done.Jun 29 2020, 10:07 AM
jasonliu added inline comments.
llvm/lib/MC/MCContext.cpp
313

Thanks for the clarification. Changed the renaming encoding using the idea in our discussion.

DiggerLin accepted this revision.Jul 2 2020, 7:33 AM

LGTM ,but it need other person to approve it before landing.

llvm/test/CodeGen/PowerPC/aix-xcoff-symbol-rename.ll
162

minor: I do not think we need the line .

This revision is now accepted and ready to land.Jul 2 2020, 7:33 AM
jasonliu marked an inline comment as done.Jul 2 2020, 8:04 AM
jasonliu added inline comments.
llvm/test/CodeGen/PowerPC/aix-xcoff-symbol-rename.ll
162

I would prefer to keep that line to show the whole picture of a common symbol.

llvm/include/llvm/MC/MCStreamer.h
580

I would suggest not to capitalize "Internal" if there is no clear special definition of that word that this is referencing.

Suggestion:
The name used internally in the assembly for references to the symbol.

581

s/at end/at the end/;

llvm/lib/MC/MCAsmInfoXCOFF.cpp
49

Please call isAlnum.

llvm/lib/MC/MCAsmStreamer.cpp
803

See other comment about missing space before left parenthesis.

848

To escape double quote (") characters, the character should be doubled. Unlike with names that start with digits, Clang does manage to handle quotes in names.

871

Minor nit: The space is missing before the first opening parenthesis.

873

We're performing the same cast/type check multiple times here...

Suggestion:

MCSymbolXCOFF *XSym = dyn_cast<MCSymbolXCOFF>(Symbol);
if (XSym && XSym->hasRename()) {
llvm/lib/MC/MCContext.cpp
317

Don't try to access the first character without locally ensuring that the string is non-empty. For example, ICC produces an "internal error" (https://godbolt.org/z/zLiJqb). It seems Clang avoids generating such by diagnosing empty string literals used as __asm__ names.

321

Replace the comma at the end of the line with a semicolon.

llvm/lib/MC/MCStreamer.cpp
1076

s/emitXCOFFRenameDirecitve/emitXCOFFRenameDirective/;

llvm/lib/MC/MCSymbolXCOFF.cpp
22

s/representatation/representation/;

37

s/representatation/representation/;

llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCTargetDesc.cpp
123–133
if (MCSymbolXCOFF *XSym = dyn_cast<MCSymbolXCOFF>(S)) {
llvm/test/CodeGen/PowerPC/aix-xcoff-symbol-rename.ll
2

s/symbol contains/symbols containing/;
s/invalid character/invalid characters/;

3

s/XCOFF platform/an XCOFF platform/;

4

s/32 bit/32-bit/;
s/64 bit/64-bit/;

5

s/test/tests/;
s/64 bit/64-bit/;
s/is/are/;

jasonliu updated this revision to Diff 275463.Jul 3 2020, 2:30 PM
jasonliu marked 18 inline comments as done.

Address comments.

jasonliu added inline comments.Jul 3 2020, 2:35 PM
llvm/lib/MC/MCContext.cpp
317

Thanks for the example and also trying it out with Clang.

LGTM with minor nit; thanks.

llvm/lib/MC/MCAsmStreamer.cpp
851

Add "a" before "double quote".
s/chacacter/character/;

This revision was automatically updated to reflect the committed changes.
jasonliu marked an inline comment as done.Jul 6 2020, 8:50 AM