This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Fix function identifiers that are invalid in PTX and a bug fix for the case of name collisions.
AcceptedPublic

Authored by arpith-jacob on Feb 29 2016, 1:55 PM.

Details

Summary

The characters '.' and '@' are not allowed by PTX in identifiers. A previous patch cleans up such identifiers for global variables. This patch cleans up identifiers of local functions as well.

An unrelated patch changed the name collision avoidance code in LLVM by adding a suffix of the pattern ["." number]. This broke NVPTXAssignValidGlobalNames.cpp as the dot is introduced in the case of collisions. I fix this by cleaning up the new name and repeating as required.

Diff Detail

Event Timeline

arpith-jacob retitled this revision from to [NVPTX] Fix function identifiers that are invalid in PTX and a bug fix for the case of name collisions..
arpith-jacob updated this object.
arpith-jacob added reviewers: eliben, rafael.
arpith-jacob added subscribers: jlebar, jholewinski, caomhin and 3 others.
jlebar added inline comments.Feb 29 2016, 2:15 PM
lib/Target/NVPTX/NVPTXAssignValidGlobalNames.cpp
40

Nit: "name, ensuring", since the gerund phrase is an appositive.

74

cleanUpName returns a string -- isn't this a dangling pointer?

rafael edited edge metadata.Feb 29 2016, 2:28 PM
rafael added a subscriber: rafael.

Wouldn't it be better to error and ask the llvm producer to not create
global values with '@' and '.' when targeting nvptx?

Cheers,
Rafael

Wouldn't it be better to error and ask the llvm producer to not create
global values with '@' and '.' when targeting nvptx?

rL205212 suggests we already do generic sanitizing for all backends. If that's true, why is it inappropriate to do target-specific sanitization in addition? That seems cleaner than playing whack-a-mole in the frontend so it doesn't emit names that aren't valid, when compiling for this target.

arpith-jacob edited edge metadata.

Don't dangle pointers :(

arpith-jacob marked 2 inline comments as done.Feb 29 2016, 3:33 PM

Wouldn't it be better to error and ask the llvm producer to not create
global values with '@' and '.' when targeting nvptx?

Unfortunately the llvm producer changed, which broke code for this target without anyone noticing :(

lib/Target/NVPTX/NVPTXAssignValidGlobalNames.cpp
74

Ouch!

jlebar accepted this revision.Feb 29 2016, 4:19 PM
jlebar added a reviewer: jlebar.

lgtm, but I'd like Rafael to indicate that his concern is resolved.

This revision is now accepted and ready to land.Feb 29 2016, 4:19 PM
arpith-jacob marked an inline comment as done.Mar 3 2016, 7:13 AM

Friendly ping. Rafael, do you have further concerns? Thanks.

Wouldn't it be better to error and ask the llvm producer to not create
global values with '@' and '.' when targeting nvptx?

http://reviews.llvm.org/rL205212 suggests we already do generic sanitizing for all backends. If that's true, why is it inappropriate to do target-specific sanitization in addition? That seems cleaner than playing whack-a-mole in the frontend so it doesn't emit names that aren't valid, when compiling for this target.

I disagree. LLVM has a bad history of playing whack-a-mole with what
names are supported by various formats and assemblers and renaming
stuff in the back. This creates problems with collisions (like you
found) and for any tools that needs to find the final name, like LTO.

The real stable solution I think is for the IR name to be the final
name. No further mangling involved. It will take a lot of work to get
there, but we can avoid making the situation worse.

Cheers,
Rafael

The real stable solution I think is for the IR name to be the final
name. No further mangling involved.

Correct me if I am wrong, but I think you are saying that the llvm IR name should only contain chars that are acceptable for *all* targets.

In this example, for valid ptx code, it's important that the period never appears in the IR name.

However, according to this patch:

For globals it is important to use "foo.1" to help C++ name demangling.

So the two consumers have opposing constraints and the suggested approach cannot work?

rnk added a subscriber: rnk.Apr 28 2016, 2:39 PM

The real stable solution I think is for the IR name to be the final
name. No further mangling involved. It will take a lot of work to get
there, but we can avoid making the situation worse.

The problem there is that we would have to make the IR symbol renamer not append '.N' when renaming symbols with name collisions. Until then, we need something like this.

I thought we used to have machinery in our assembler for mangling away invalid characters. Did you remove that stuff?

hfinkel added a subscriber: hfinkel.EditedJan 30 2017, 4:13 PM
In D17738#416060, @rnk wrote:

The real stable solution I think is for the IR name to be the final
name. No further mangling involved. It will take a lot of work to get
there, but we can avoid making the situation worse.

The problem there is that we would have to make the IR symbol renamer not append '.N' when renaming symbols with name collisions. Until then, we need something like this.

I thought we used to have machinery in our assembler for mangling away invalid characters. Did you remove that stuff?

How would we like to move forward here?

What @rafael said, that we should not re-mangle IR names, sounds appealing. This holds obvious tooling advantages. It is also true that we need some way to generate unique symbol names that does not interfere with existing mangling schemes. This means that we need to have some character that is not in [A-Za-z0-9_], and is not @ (because that's used by ELF symbol versioning). ?$@' are used by MSVC mangling. PTX allows only [A-Za-z0-9_$] (and also %, but only as the first character, so that doesn't help). Unfortunately, this does not seem to leave anything we can use (we can use '$' for PTX, but not for MSVC).

One option is that we add a function to LLVM get an available separator character, which can default to '.', but we set to '$' for nvptx, and use that for generating new names at the IR level. Thoughts?

rnk added a comment.Jan 30 2017, 5:05 PM

One option is that we add a function to LLVM get an available separator character, which can default to '.', but we set to '$' for nvptx, and use that for generating new names at the IR level. Thoughts?

This seems practical. Perhaps it could be part of the name mangling scheme already encoded in DataLayout?

espindola edited reviewers, added: espindola; removed: rafael.Mar 14 2018, 4:59 PM