Page MenuHomePhabricator

[CodeGen] [CUDA] Add the ability set default attrs on functions in linked modules.
ClosedPublic

Authored by jlebar on Jan 10 2017, 3:52 PM.

Details

Summary

Now when you ask clang to link in a bitcode module, you can tell it to
set attributes on that module's functions to match what we would have
set if we'd emitted those functions ourselves.

This is particularly important for fast-math attributes in CUDA
compilations.

Each CUDA compilation links in libdevice, a bitcode library provided by
nvidia as part of the CUDA distribution. Without this patch, if we have
a user-function F that is compiled with -ffast-math that calls a
function G from libdevice, F will have the unsafe-fp-math=true (etc.)
attributes, but G will have no attributes.

Since F calls G, the inliner will merge G's attributes into F's. It
considers the lack of an unsafe-fp-math=true attribute on G to be
tantamount to unsafe-fp-math=false, so it "merges" these by setting
unsafe-fp-math=false on F.

This then continues up the call graph, until every function that
(transitively) calls something in libdevice gets unsafe-fp-math=false
set, thus disabling fastmath in almost all CUDA code.

Diff Detail

Repository
rL LLVM

Event Timeline

jlebar updated this revision to Diff 83887.Jan 10 2017, 3:52 PM
jlebar retitled this revision from to [CodeGen] [CUDA] Add the ability set default attrs on functions in linked modules..
jlebar updated this object.
jlebar added a reviewer: echristo.
jlebar added subscribers: llvm-commits, hfinkel.

Friendly ping.

Test?
Otherwise looks OK.

clang/include/clang/Frontend/CodeGenOptions.h
139 ↗(On Diff #83887)

Nit: I don't think "Propagate" is the right word, I feel something like "Override" would be better to describe what's happening.

clang/lib/CodeGen/CGCall.cpp
1720 ↗(On Diff #83887)

Are we sure it always work? What if a function is annotated with an attribute incompatible with the attributes you're gonna add?

Test?

Yeah, initially I didn't think it was simple, but I have an idea. I will look into it...

jlebar planned changes to this revision.Jan 13 2017, 10:31 PM

Marking planned changes so I remember to see about that test, thanks for reminding me.

clang/include/clang/Frontend/CodeGenOptions.h
136 ↗(On Diff #83887)

Note to self, typo.

139 ↗(On Diff #83887)

Right now it doesn't override them, though. It just adds the new attrs. You're saying below that this is not safe, so maybe if we change that we should also change the name...

clang/lib/CodeGen/CGCall.cpp
1720 ↗(On Diff #83887)

I have no idea -- what is going to happen?

In the CUDA use-case we make tons of assumptions about the contents of libdevice -- if we're making an additional assumption here, I'm fine with that. Obviously we don't just want to nuke all of the attributes on the incoming function, because some of them might be derived from the IR as opposed to the codegen options.

mehdi_amini added inline comments.Jan 13 2017, 10:44 PM
clang/lib/CodeGen/CGCall.cpp
1720 ↗(On Diff #83887)

I have no idea -- what is going to happen?

BAD THINGS WILL HAPPEN! ;)

(The verifier will fail)

Test?

Yeah, initially I didn't think it was simple, but I have an idea. I will look into it...

I thought a simple "fake" runtime which is a single IR function that you link in and check that the emitted IR has the attributes added?

jlebar updated this revision to Diff 84638.Jan 17 2017, 12:03 AM

Add test.

Mehdi, Eric, wdyt?

hfinkel added inline comments.Jan 17 2017, 6:41 AM
clang/lib/CodeGen/CGCall.cpp
1720 ↗(On Diff #83887)

I think we need to decide what happens here. We can either remove attributes we're going to set, merge them properly, or skip them. I think that we should do the following:

  1. Merge attributes properly (or just skip those already present on the function)
  2. Add a flag to Clang, that can be used when compiling libdevice (or whatever), that causes functions to be generated with only minimal attributes (i.e. keep readnone/only for pure/const, but don't add unsafe-math flags or target info).

Also, higher-level question: Why is this bitcode-linking functionality specific to CUDA at all?

jlebar added inline comments.Jan 17 2017, 9:19 AM
clang/lib/CodeGen/CGCall.cpp
1720 ↗(On Diff #83887)

Merge attributes properly (or just skip those already present on the function)

There are 27 attributes here -- I am pretty hesitant to add merging logic for each one individually. That logic will be dead code, with all the resulting problems (yagni, probably doesn't actually work, etc.).

Either skipping attrs already present on the function or explicitly overwriting them seems to be the Simplest Thing That Could Possibly Work, so if it's OK with you all, I'd prefer to do one of those two.

Looking through the list of attrs, many of them are things you'd want to overwrite-if-present rather than skip-if-present, at least in my use-case (which is the only one we have atm). And indeed, per below, using overwrite-if-present gets us out of adding additional complexity and dead code to clang, so my inclination is to go towards that.

If there are a few attrs here that we *don't* want to overwrite-if-present, maybe we could just move them out of this function entirely and not set them when linking. All I actually care about are the math flags and the CUDA flags; I just thought it was a cleaner API to say that we set all these attrs than to cherrypick.

Add a flag to Clang, that can be used when compiling libdevice (or whatever), that causes functions to be generated with only minimal attributes (i.e. keep readnone/only for pure/const, but don't add unsafe-math flags or target info).

We do not compile libdevice, so this will again be dead code. Do we have a user in mind for this?

To me the Simplest Thing would be to overwrite-if-present, per above. Then we don't need to add any dead code or additional flags. I also think that's probably what you want when you're doing libdevice linking, which is our only usecase at the moment.

Why is this bitcode-linking functionality specific to CUDA at all?

What in the patch makes you think it is? Is the issue that we can get this functionality only via -mlink-cuda-bitcode -- -mlink-bitcode does something different?

Again we could "fix" this by coming up with some "link spec" you pass to -mlink-bitcode along with the bitcode file, but that'd be added complexity for little gain, I think?

hfinkel added inline comments.Jan 20 2017, 4:57 PM
clang/lib/CodeGen/CGCall.cpp
1720 ↗(On Diff #83887)

There are 27 attributes here -- I am pretty hesitant to add merging logic for each one individually. That logic will be dead code, with all the resulting problems (yagni, probably doesn't actually work, etc.).

To be clear, I'm completely against adding dedicated merging logic here. Either we have merging logic (used by the inliner or whatever) that we can use or we shouldn't do it.

king through the list of attrs, many of them are things you'd want to overwrite-if-present rather than skip-if-present, at least in my use-case (which is the only one we have atm). And indeed, per below, using overwrite-if-present gets us out of adding additional complexity and dead code to clang, so my inclination is to go towards that.

Sounds good to me. Please just add comments in the code explaining that this is specifically for libdevice, which is vendor provided, and the rest of the rationale from the patch description.

jlebar updated this revision to Diff 85305.Jan 22 2017, 4:36 PM

Add comment and update test to check overriding behvaior.

It turns out no code changes were necessary to get the "override-if-present"
behavior we wanted. addAttribute was already doing this.

jlebar updated this revision to Diff 85306.Jan 22 2017, 4:40 PM

Expand on comment a bit more.

clang/lib/CodeGen/CGCall.cpp
1720 ↗(On Diff #83887)

I'm completely against adding dedicated merging logic here. Either we have merging logic (used by the inliner or whatever) that we can use or we shouldn't do it.

Got it. AFAICT we don't have existing merging logic for most of these attributes. (The relevant logic seems to live in IR/Attributes.cpp and in Attributes.td.)

I've updated the patch to add a test and comments, as you suggested. PHAL.

hfinkel added inline comments.Jan 23 2017, 6:44 AM
clang/lib/CodeGen/CodeGenModule.h
1030 ↗(On Diff #85306)

I think there is an important point here that is missing: for libdevice, we happen to know that this is safe. I think that needs to be in the comment somehow. In general, this is dangerous. libdevice, as I understand it, is specifically designed to make this work (via NVVMReflect).

jlebar updated this revision to Diff 85449.Jan 23 2017, 1:31 PM

Update comment per Hal's comments.

jlebar marked an inline comment as done.Jan 23 2017, 1:32 PM
jlebar added inline comments.
clang/lib/CodeGen/CodeGenModule.h
1030 ↗(On Diff #85306)

I've added a comment, but I'm not sure it's quite as dangerous as you seem to think, so maybe my comment isn't scary enough. I'm happy to continue iterating.

Looking through the specific attrs affected here, for everything other than the fast-math attrs, it seems that we're merely adding attrs to make the code more conservative, but never removing them.

libdevice, as I understand it, is specifically designed to make this work (via NVVMReflect).

FWIW I wouldn't really say that... NVVMReflect is an over-general solution to the same problem solved by the denormal-fp-math attr. Ultimately I would like to nix all of the nvptx-specific FTZ attrs and just use denormal-fp-math.

hfinkel accepted this revision.Jan 23 2017, 2:34 PM
hfinkel added inline comments.
clang/lib/CodeGen/CodeGenModule.h
1030 ↗(On Diff #85306)

This gets the point across. I'm happy.

This revision is now accepted and ready to land.Jan 23 2017, 2:34 PM
echristo accepted this revision.Jan 23 2017, 2:36 PM

(Hal and Mehdi have been looking at this, but wanted to chime in that I'm also happy with this solution).

-eric

This revision was automatically updated to reflect the committed changes.
jlebar marked an inline comment as done.