Page MenuHomePhabricator

[IR] Avoid the need to prefix MS C++ symbols with '\01'
ClosedPublic

Authored by rnk on Feb 19 2015, 3:57 PM.

Details

Summary

Now the Windows mangling modes ('w' and 'x') do not do any mangling for
symbols starting with '?'. This means that clang can stop adding the
hideous '\01' leading escape. This means LLVM debug logs are less likely
to contain ASCII escape characters and it will be easier to copy and
paste MS symbol names from IR.

Finally.

For non-Windows platforms, names starting with '?' still get IR
mangling, so once clang stops escaping MS C++ names, we will get extra
'_' prefixing on MachO. That's fine, since it is currently impossible to
construct a triple that uses the MS C++ ABI in clang and emits macho
object files.

Diff Detail

Repository
rL LLVM

Event Timeline

rnk updated this revision to Diff 20352.Feb 19 2015, 3:57 PM
rnk retitled this revision from to Mangler: Don't prefix MS ABI names with a leading ? character.
rnk updated this object.
rnk added reviewers: rafael, majnemer.
rnk added a subscriber: Unknown Object (MLST).
rnk updated this revision to Diff 20357.Feb 19 2015, 4:34 PM
  • Remove the check for stdcall/fastcall mangling
majnemer edited edge metadata.Feb 19 2015, 4:35 PM

LGTM but I'd like @rafael to look at this as well.

rafael edited edge metadata.Feb 19 2015, 5:47 PM

This would change the result of

@"?foo" = private global i32 42

no? In particular it would miss the .L on ELF.

lib/IR/Mangler.cpp
118 ↗(On Diff #20357)

You can move this before " Prefix = ..";

rnk added a comment.Feb 20 2015, 9:57 AM

This would change the result of

@"?foo" = private global i32 42

no? In particular it would miss the .L on ELF.

Yes. The lack of .L is concerning, but why have we been able to get by without the L prefix on COFF for so long with these names? Is there a bug there too? Maybe we should do the private symbol prefixing even in the presence of '?' or '\01'?

no? In particular it would miss the .L on ELF.

Yes. The lack of .L is concerning, but why have we been able to get by without the L prefix on COFF for so long with these names? Is there a bug there too? Maybe we should do the private symbol prefixing even in the presence of '?' or '\01'?

I think the verifier should reject globals with a \0 name that are
internal or private. Globals with internal and private linkage can be
renamed. It makes no sense to force a particular name. I think I have
a wip patch for this on git. I will try to find it after lunch.

The objc implementation in clang used to break this, but I think it is
now fixed.

Not adding a L prefix for a genuine private global on COFF does look like a bug.

Cheers,
Rafael

The WIP patch is attached. Unfortunately clang still produces bogus
GVs for objc. That would have to be fixed first.

rnk updated this revision to Diff 20893.Feb 27 2015, 1:38 PM
rnk edited edge metadata.
  • Ensure symbols starting with '?' get private prefixes
rnk added a comment.Feb 27 2015, 1:41 PM

So, I don't really want to step into the Obj-C vs. LLVM IR mangler mess. I fixed this patch so that symbols starting with '?' can get a private prefix.

This patch has some functionality change on MachO, where symbols that started with '?' will now lose their prefix, but I think that's acceptable. OK to commit?

rnk added a comment.Mar 2 2015, 11:05 AM

Monday ping

This needs at least three extra tests:

  • Showing that ELF still have a .L prefix when given something like '@"?foo" private global i32 42'
  • Similar for MachO, it should still get a L prefix
  • A test showing the difference that now a MachO GV starting with ? will *not* get a _ prefix.

It might also be nice to have a langref/release notes update.

And last but not least, Grosbach is probably the one to approve the MachO change.

rnk updated this revision to Diff 21153.Mar 3 2015, 3:52 PM
  • Add langref, release notes, macho tests, and elf tests
rafael added inline comments.Mar 3 2015, 4:36 PM
test/CodeGen/X86/mangle-question-mark.ll
23 ↗(On Diff #21153)

You need to CHECK the non-stub too. Probably the easiest is to just check with global variables instead of functions.

rnk added inline comments.Mar 4 2015, 2:42 PM
test/CodeGen/X86/mangle-question-mark.ll
23 ↗(On Diff #21153)

I'd rather just check the .indirect_symbol directive.

rnk updated this revision to Diff 21235.Mar 4 2015, 2:42 PM
  • Check the stubs
rnk added a comment.Mar 11 2015, 2:27 PM

Thoughts, Jim?

grosbach edited edge metadata.Mar 12 2015, 5:11 PM

Conceptually, it's a bit odd to me that something for windows changes macho output all, but I can see why it happens here.

That aside, this is probably OK. I suggest also asking John McCall, as he knows far more than I about what sorts of cleverness front ends may get up to with symbol naming.

espindola edited reviewers, added: espindola; removed: rafael.Mar 15 2018, 11:00 AM
rnk retitled this revision from Mangler: Don't prefix MS ABI names with a leading ? character to [IR] Avoid the need to prefix MS C++ symbols with '\01'.Mar 15 2018, 3:18 PM
rnk edited the summary of this revision. (Show Details)
rnk edited reviewers, added: pcc; removed: grosbach.
rnk updated this revision to Diff 138635.Mar 15 2018, 3:19 PM
  • rebase
  • rewrite
  • Leave MachO alone
rnk added a comment.Mar 15 2018, 3:19 PM

PTAL, let's try to do this again. I'm tired of having to edit out the '\01' escape when reading IR names.

rnk updated this revision to Diff 138653.Mar 15 2018, 4:53 PM
  • Rename DataLayout helper
  • Fix bug in non-GlobalValue mangling path found in clang indexer

While having a symbol table resolved some of the issues, I still think we should aim for the IR having real symbol names one day. It would make it impossible for two GVs to have the same symbol and much easier to reason about which GVs are used from inline assembly.

So while this is an improvement, wouldn't it be better to try to switch the produces to using MM_None?

rnk added a comment.Mar 15 2018, 5:41 PM

I guess the only consideration is, are we ready for clang to emit "_malloc", "_main", and "_printf". I'm not sure I have the consensus to make that change.

Clang already does the __fastcall etc mangling in the frontend. It would be nice to not have to add escapes there as well. As long as we have these escapes, there will be bugs.

I doubt there are any other frontends that care about the fastcall mangling, and we can provide this API if they want to reuse our mangling logic.

I would say this is at least an incremental step towards that. After this, I will update all the Clang IRGen test cases to not expect the '\01' escape for MS ABI code. Then there will be fewer cases to update when we eventually get consensus to go to MM_None. Reasonable?

In D7775#1039687, @rnk wrote:

I guess the only consideration is, are we ready for clang to emit "_malloc", "_main", and "_printf". I'm not sure I have the consensus to make that change.

If I recall correctly, last time the only objection was from the MachO camp. Since the mangling mode is part of the datalayout now, we could leave MachO alone which would already at least simplify the implementation.

I would say this is at least an incremental step towards that. After this, I will update all the Clang IRGen test cases to not expect the '\01' escape for MS ABI code. Then there will be fewer cases to update when we eventually get consensus to go to MM_None. Reasonable?

Yes, I am sold. I am taking a look at the implementation.

espindola accepted this revision.Mar 15 2018, 6:58 PM

LGTM with just a nit.

llvm/lib/IR/Mangler.cpp
146 ↗(On Diff #138653)

You don't need to set Prefix in here, just MSFunc. With that you also don't need the {}.

This revision is now accepted and ready to land.Mar 15 2018, 6:58 PM
smeenai added inline comments.Mar 16 2018, 1:33 AM
llvm/docs/LangRef.rst
1957 ↗(On Diff #138653)

To clarify, I assume x is used on x86 and w is used everywhere else. stdcall and fastcall are only a thing on x86, in which case w shouldn't be mangling them? (This is more confusing with the old version of the comment, but it's still ambiguous here.)

rnk added a comment.Mar 16 2018, 12:59 PM

I hadn't considered switching just Windows to MM_None, that's is an interesting idea. Usually Windows has the most bugs involving '\01' because of how prevalent it is on Windows, but if we do MM_None that will all go away, and only Darwin will suffer them. =p

llvm/lib/IR/Mangler.cpp
146 ↗(On Diff #138653)

Yep, will do.

rnk added inline comments.Mar 16 2018, 1:07 PM
llvm/docs/LangRef.rst
1957 ↗(On Diff #138653)

__vectorcall gets @@N on x64, so the way I think of it is, all these conventions get manglings on Windows, but __fastcall and __stdcall are ignored on x64. It just so happens that we need to do some of this ignoring in the IR mangler, but we could just as easily reject these CCs in the verifier since they should be meaningless for the target.

This revision was automatically updated to reflect the committed changes.