This is an archive of the discontinued LLVM Phabricator instance.

[flang] Add the proposal document and rationale for the internal naming module that was previously added.
ClosedPublic

Authored by schweitz on Apr 29 2020, 7:15 AM.

Details

Summary

This document describes how uniquing of internal names is done. This
name uniquing is done to support the constraints and invariants of the FIR
dialect of MLIR.

Diff Detail

Event Timeline

schweitz created this revision.Apr 29 2020, 7:15 AM
Herald added a project: Restricted Project. · View Herald Transcript
schweitz retitled this revision from Add the proposal document and rationale for the internal naming module that was previously added. to [flang] Add the proposal document and rationale for the internal naming module that was previously added..Apr 29 2020, 7:17 AM
schweitz added a project: Restricted Project.
jeanPerier accepted this revision.Apr 29 2020, 8:40 AM
This revision is now accepted and ready to land.Apr 29 2020, 8:40 AM
schweitz updated this revision to Diff 261076.Apr 29 2020, 3:57 PM

review comments: minor edits to improve the text

Could you comment on whether this mangling will have any effect on interfacing with C/C++? Will this have any effects on LTO. What happens if a bind name is specified?
http://web.mit.edu/tibbetts/Public/inside-c/www/mangling.html

Could you comment on whether this mangling will have any effect on interfacing with C/C++? Will this have any effects on LTO. What happens if a bind name is specified?
http://web.mit.edu/tibbetts/Public/inside-c/www/mangling.html

Hi Kiran,

Good questions and thanks for asking.

The hope is that this mangling will not conflict with C and C++, of course. None of the languages (C, C++, or Fortran) have a standard mangling. C reserves the underscore, double underscore, and underscore capital letter prefixes [1],[2]. A description of a common C++ name mangling scheme is [3],[your link]. It seems like the only common thing about Fortran name mangling implementations is that different vendors have their own, as can be experimented with [4].

The uniquing scheme described in this document has some similarities and differences to other mangling schemes, but it was designed to minimize collisions with those spaces.

As far as using bind C names, the plan is to just use the bind C name directly as it should never have the prefix marker "_Q", so it will be recognized as a symbol name that was not uniqued. That may or may not be sufficient depending other unknowns. (We are similarly targeting llvm intrinsic functions in our present work.) The fallback plan would be to unique the name and then relabel when lowering to LLVM.

The bidirectional ability and flexibility are key objectives. It means that it may be the case that these names are never exposed to the LLVM layer, in the object files, to the linker, etc. Since this scheme can recover the symbols from the front-end, the symbols can themselves be lowered as a conversion and in a target-dependent manner.

[1] https://stackoverflow.com/questions/39625352/why-do-some-functions-in-c-have-an-underscore-prefix
[2] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
[3] https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling
[4] https://fortran.godbolt.org/

sscalpone accepted this revision.Apr 30 2020, 8:24 AM

Reminder, flang should follow the well-established naming conventions when creating external names for "f77" entities. There's nothing in this proposal that blocks that; this is just an fyi.

flang/documentation/BijectiveInternalNameUniquing.md
8

"goal" could be "feature" or "requirement" -- not important.

schweitz updated this revision to Diff 261244.Apr 30 2020, 8:36 AM

s/goal/requirement/

Thanks @schweitz for the detailed reply.

I have a couple more questions.
-> Isn't this necessary for the Block construct?
-> Will uniquing all names lead to lower readability of the IR? (Assuming that is what being proposed here)
-> Yeah, renaming local variables back to original names (during lowering to LLVM IR) when there is no clash seems better for readability.

Thanks @schweitz for the detailed reply.

I have a couple more questions.
-> Isn't this necessary for the Block construct?
-> Will uniquing all names lead to lower readability of the IR? (Assuming that is what being proposed here)
-> Yeah, renaming local variables back to original names (during lowering to LLVM IR) when there is no clash seems better for readability.

The uniquing is required in the context of the MLIR Module symbol space. (Artifacts with a process lifetime such as functions, globals, etc.) Locals need not be uniqued as they have a unique identity as ssa-values. (Their names are tracked with name attributes attached to the Op.)

schweitz marked an inline comment as done.Apr 30 2020, 9:18 AM

The uniquing is required in the context of the MLIR Module symbol space. (Artifacts with a process lifetime such as functions, globals, etc.) Locals need not be uniqued as they have a unique identity as ssa-> values. (Their names are tracked with name attributes attached to the Op.)

Thanks @schweitz. Adding the above information to the doc might be helpful.

schweitz updated this revision to Diff 261274.Apr 30 2020, 10:07 AM

Clarify that the uniquing is for global artifacts.

tskeith added inline comments.
flang/documentation/BijectiveInternalNameUniquing.md
35

Submodules of an ancestor module have to have distinct names, so you don't need to include s1mod in the unique name for s2mod (though it doesn't hurt). So the unique name for smod2 could be just _QMmodSs2mod.

50

What about the blank common block? Is its name just _QB?

59

On line 30 it says F is the prefix. P makes more sense to me.

This revision was automatically updated to reflect the committed changes.
schweitz marked 3 inline comments as done.May 1 2020, 8:27 AM
schweitz added inline comments.
flang/documentation/BijectiveInternalNameUniquing.md
35

True. It seemed the more cautious approach to include more information about the symbol origin.

50

Yes.

59

One (F) signifies scope and the other (P) identity.