This is an archive of the discontinued LLVM Phabricator instance.

Implement attribute target multiversioning
AbandonedPublic

Authored by erichkeane on Oct 5 2017, 12:59 PM.

Details

Summary

GCC's attribute 'target', in addition to being an optimization hint,
also allows function multiversioning. We currently have the former
implemented, this is the latter's implementation.

Note that it ends up having to permit redefinition of functions so
that they can all be emitted. Additionally, all versions of the function
must be emitted, so this also manages that.

Function templates are NOT supported (not supported in GCC either).

Options on how to split this patch up would also be particularly solicited,
since this IS a large patch.

Diff Detail

Event Timeline

erichkeane created this revision.Oct 5 2017, 12:59 PM
hfinkel added a subscriber: hfinkel.Oct 5 2017, 1:19 PM

Can you explain briefly what is required of the system in order to support this (is it just ifunc support in the dynamic linker / loader)?

In general, there are a number of places in this patch, at the Sema layer (including in the error messages), that talk about how this is an x86-only feature. As soon as someone adds support for a second architecture, this will cause unnecessary churn (and perhaps miss places resulting in stale comments). Can you create some callbacks, in TargetInfo or similar, so that we can just naturally make this work on other architectures?

Function templates are NOT supported (not supported in GCC either).

Can you please elaborate on this? I'd like to see this feature, but I don't believe that we should introduce features that artificially force users to choose between good code design and technical capabilities. The fact that GCC does not support this feature on templates seems unfortunate, but not in itself a justification to replicate that restriction. I'm obviously fine with adding template functions in a follow-up commit, but I want to understand whether there's something in the design (of the feature or of your implementation approach) that makes this hard. I wouldn't want this to ship without template support.

lib/Basic/Targets/X86.cpp
1335

How are we expected to maintain this array?

Can you explain briefly what is required of the system in order to support this (is it just ifunc support in the dynamic linker / loader)?

Yep, this feature depends entirely on ifuncs.

In general, there are a number of places in this patch, at the Sema layer (including in the error messages), that talk about how this is an x86-only feature. As soon as someone adds support for a second architecture, this will cause unnecessary churn (and perhaps miss places resulting in stale comments). Can you create some callbacks, in TargetInfo or similar, so that we can just naturally make this work on other architectures?

Yep, I have that enforced, with the hope that it would make it easier to identify all the places that this change needs to be made. Currently, there are a few 'TargetInfo' features that need to be supported (isValidCPUName/isValidCPUFeature/compareCpusAndFeatures), the SEMA check relaxed (though I could perhaps add a TargetInfo::IsFunctionMultiVersioningSupported function if thats what you're saying?), and CodeGenFunction::FormResolverCondition/CodeGenFunction::EmitMultiVersionResolver ( which could easily have FormResolverCondition moved to TargetInfo, assuming we OK with a CodeGen file calling into a TargetInfo?). Are these the changes you mean?

Function templates are NOT supported (not supported in GCC either).

Can you please elaborate on this? I'd like to see this feature, but I don't believe that we should introduce features that artificially force users to choose between good code design and technical capabilities. The fact that GCC does not support this feature on templates seems unfortunate, but not in itself a justification to replicate that restriction. I'm obviously fine with adding template functions in a follow-up commit, but I want to understand whether there's something in the design (of the feature or of your implementation approach) that makes this hard. I wouldn't want this to ship without template support.

I don't think there are limitations beyond my understanding of the code around redefinitions of templates. I investigated for a while and was unable to figure it out over a few days, but that doesn't mean terribly much. I gave up temporarily in exchange for getting the rest of the feature in (and the fact that GCC doesn't support it made it an explainable stopping point; additionally, none of our users of 'attribute-target' actually have C++ codebases). I believe I'd be able to add function-template support with some additional work, and am definitely willing to do so.

Thanks Hal, I appreciate your advice/feedback! I realize this is a huge patch, so anything I can get to make this more acceptable is greatly appreciated.

lib/Basic/Targets/X86.cpp
1335

Ugg... I know. I have no idea, it is ANOTHER implementation of this foolish list, just in a slightly different order. We NEED to know the order, so we know how to order the comparisons. They are generally arbitrary orderings, so I have no idea besides "with hard work". If you have a suggestion, or perhaps a way to reorder one of the OTHER lists, I'd love to hear it.

Can you explain briefly what is required of the system in order to support this (is it just ifunc support in the dynamic linker / loader)?

Yep, this feature depends entirely on ifuncs.

In general, there are a number of places in this patch, at the Sema layer (including in the error messages), that talk about how this is an x86-only feature. As soon as someone adds support for a second architecture, this will cause unnecessary churn (and perhaps miss places resulting in stale comments). Can you create some callbacks, in TargetInfo or similar, so that we can just naturally make this work on other architectures?

Yep, I have that enforced, with the hope that it would make it easier to identify all the places that this change needs to be made. Currently, there are a few 'TargetInfo' features that need to be supported (isValidCPUName/isValidCPUFeature/compareCpusAndFeatures), the SEMA check relaxed (though I could perhaps add a TargetInfo::IsFunctionMultiVersioningSupported function if thats what you're saying?), and CodeGenFunction::FormResolverCondition/CodeGenFunction::EmitMultiVersionResolver ( which could easily have FormResolverCondition moved to TargetInfo, assuming we OK with a CodeGen file calling into a TargetInfo?). Are these the changes you mean?

I added some comments inline.

Function templates are NOT supported (not supported in GCC either).

Can you please elaborate on this? I'd like to see this feature, but I don't believe that we should introduce features that artificially force users to choose between good code design and technical capabilities. The fact that GCC does not support this feature on templates seems unfortunate, but not in itself a justification to replicate that restriction. I'm obviously fine with adding template functions in a follow-up commit, but I want to understand whether there's something in the design (of the feature or of your implementation approach) that makes this hard. I wouldn't want this to ship without template support.

I don't think there are limitations beyond my understanding of the code around redefinitions of templates. I investigated for a while and was unable to figure it out over a few days, but that doesn't mean terribly much. I gave up temporarily in exchange for getting the rest of the feature in (and the fact that GCC doesn't support it made it an explainable stopping point; additionally, none of our users of 'attribute-target' actually have C++ codebases).

I anticipate this will change.

I believe I'd be able to add function-template support with some additional work, and am definitely willing to do so.

Great, thanks!

Thanks Hal, I appreciate your advice/feedback! I realize this is a huge patch, so anything I can get to make this more acceptable is greatly appreciated.

include/clang/Basic/DiagnosticSemaKinds.td
9343

I'd not mention x86 here. As soon as we add support for another architecture, we'll need to reword this anyway.

: Error<"function multiversioning with 'target' is not supported on this architecture and platform">;
lib/Basic/Targets/X86.cpp
1301

Can you please separate out these changes adding the amdfam10 target strings into a separate patch?

1335

At the risk of a comment that will become stale at some point, is it possible to say something here like:

// This list has to match the list in GCC. In GCC 6.whatever, the list is in path/to/file/in/gcc.c

At least that would give someone a clue of how to update this list if necessary.

lib/Sema/SemaDecl.cpp
9312

No need to mention x86 in this comment.

9342

No need to have x86 here in the comment or the check. Just add some function to TargetInfo:

if (!Context.getTargetInfo().supportsFuncMultiversioning()) {
  Diag(
test/CodeGenCXX/attr-target-multiversion.cpp
21

Tests for interactions with function pointers and virtual functions?

echristo edited edge metadata.Oct 5 2017, 6:36 PM

This generally works for me modulo the things that Hal has mentioned. You'll probably want to add Richard to the review list for the Sema bits as well.

Thanks!

lib/Sema/SemaDecl.cpp
3258

Extra whitespace.

aaron.ballman added a subscriber: aaron.ballman.
aaron.ballman added inline comments.
include/clang/AST/Decl.h
2240

Drop the this->

2243

Same

include/clang/Basic/Attr.td
1847

operator== instead of operator ==

1852

Same

1898

Don't use auto here.

include/clang/Basic/DiagnosticSemaKinds.td
9339

'causes' seems a bit strange; how about function redeclaration declares a multiversioned function,? (Or something else, I'm not tied to my wording.)

include/clang/Sema/Sema.h
1944

It's a bit strange that this function is called "Check" but it mutates the arguments.

lib/Basic/Targets/X86.cpp
1407

const auto &

lib/CodeGen/CodeGenFunction.h
3758

The comment doesn't inspire confidence with the trailing question mark. ;-)

lib/Sema/SemaDecl.cpp
9304

const auto *

9308

Don't use auto here.

9322

const auto &

9344

Don't use auto here.

9350

Should be named Result

9358

Remove the comma after case.

9360–9363

Might be more succinctly written with ?: (unsure which looks better, try it and see).

9367

const auto * -- might as well hoist this to the top of the function and remove the call to hasAttr<>() above.

9369

I suspect this will give warnings about having a default label in a fully-covered switch.

9407

a 'all' -> an 'all'

9429

Attribute -> attribute

9435

MV -> MultiVersion

9590

Can use dyn_cast_or_null<>() instead.

12250

const auto *

12251

Don't use auto here.

12255

const auto *.

erichkeane marked 31 inline comments as done.Oct 6 2017, 12:53 PM
erichkeane added a subscriber: rnk.

Weew... I think I got everything. Thanks for the review you three! I've got another patch coming momentarily with what I believe is all your suggestions.

include/clang/Basic/DiagnosticSemaKinds.td
9339

Thats as good as anything I can think of. I'll change to yours for now, but if someone else comes up with something that sounds better, I'll consider us open to it.

lib/Basic/Targets/X86.cpp
1301

Talked to Craig on this... apparently the rules for amdfam10/amdfam15/amdfam10h/amdfam15h are a bit more complicated (-march supports ONLY the amdfam10, validateCpuIs only supports the "h" versions (of both), and only the 'march' version is supported in this feature. I may have to toss another function to mess with these strings to make it work.

lib/CodeGen/CodeGenFunction.cpp
2295

There IS a "CodeGen/TargetInfo" here, but it isn't as comprehensive as the Basic/TargetInfo types. I wonder if there would be value in creating a 'TargetCodeGenInfo::EmitCpuIs' and TargetCodeGenInfo::EmitCpuSupports to replace the X86 version of these, and suppressing this?

It might be valuable to simply replace the 'EmitTargetArchBuiltinExpr' with something to do this as well. Those handful of target-arch-builtins are a little messy, and are pretty messy in CGFunction.

Probably will have to ask @rnk if he's got a good thought here. I mentioned trying to break up this target info at one point, and he thought it wasn't a great idea.

lib/CodeGen/CodeGenFunction.h
3758

Woops! Personal review comment was left in :)

lib/Sema/SemaDecl.cpp
9360–9363

It does actually... Clang format makes this look really nice.

9369

I didn't see any, but I have no problem removing it anyway.

test/CodeGenCXX/attr-target-multiversion.cpp
21

Looking into it, I'm not sure how virtual functions should LOOK in this case (and GCC doesn't actually implement it either!). I might punt for now while the rest of the patch is in shape, and add it in with function templates/etc.

erichkeane marked 5 inline comments as done.

All requested changes AFAIK. Also, discovered that I wasn't calling cpu-init at the beginning of my resolver, so added that.

Note: Virtual functions seem to have a bunch of complexity involved, so I'm going to punt on that for now. GCC also doesn't support it, so I feel ok doing that later while dealing with function templates.

aaron.ballman edited edge metadata.Oct 8 2017, 7:28 AM

The attribute and sema bits look good to me, but I agree that you might want Richard's opinions before committing.

lib/Sema/SemaDecl.cpp
9312

const auto *

The attribute and sema bits look good to me, but I agree that you might want Richard's opinions before committing.

Oh, definitely! I actually put this up for @echristo to take a look at as a sanity check. I realized it was a big patch so didn't want to spam a bunch of people. @rsmith was definitely on the list! I'll get this last const-auto fix (FWIW, I'm seeing the benefits now of for (X: Xs) being the same as for (const auto &&X : Xs)...) and add him and @rnk to start doing this as a 'final' review.

Thank you very much @echristo, @hfinkel and @aaron.ballman !

erichkeane updated this revision to Diff 118218.Oct 9 2017, 9:34 AM
erichkeane added reviewers: rnk, rsmith.

1 more const-auto. Also noticed I'd missed that removing the 'default' caused the function to fallthrough, so I added llvm_unreachable to the bottom.

Craig noticed a pair of ordering issues in the TargetArray list, so fixed htose.

@rsmith and @rnk just a quick bump! I'd like to get your thoughts on the SEMA changes here.

Thanks!
-Erich

Just a quick rebase, there were a few changes in the area.

I note my rebase lost the tests... I'll fix that up soon, sorry guys!

Realized I can put variables in an Attribute, so I put MV info in the Attribute rather than at the FD level.

@AaronBallman and @echristo, if you could take another look, I'd appreciate it.

aaron.ballman added inline comments.Oct 27 2017, 5:36 AM
include/clang/Basic/Attr.td
1917

Align the comments (at least the wrapped part of them)?

lib/Sema/SemaDecl.cpp
9306

OldFD can be const (and that can be propagated to the other decls).

9317

Do you want to assert that TA is not null or handle treat null specially?

9368

Reverse the conditions so the inexpensive bit is checked first, or early-break if Result becomes false.

9370

Reverse conditions.

erichkeane marked 2 inline comments as done.Oct 27 2017, 8:42 AM
erichkeane added inline comments.
include/clang/Basic/Attr.td
1917

Ugg... I did a ton of work to get these lined up, then format must have had its fun with it. I'll reflow these again.

lib/Sema/SemaDecl.cpp
9368

Ah, right... I think for a while during implementation the "Check" function was doing important work, but this must have been fixed along the way. Fix incoming!

Fixes for Aaron's comments.

Fix comment in Attr.td that @aaron.ballman brought up on IRC.

I think the attribute and sema parts look good, but I leave the codegen bits to someone with more experience in that area.

rsmith edited edge metadata.Oct 27 2017, 4:53 PM

I'm not entirely happy with the AST representation you're using here. Allowing multiple declarations of the same entity to have (semantically distinct) bodies breaks our AST invariants, and will cause things like our PCH / modules support to fail. This can probably be made to work, but I'd like to see what that looks like before we commit to modeling a multiversioned function this way rather than, say, with different redeclaration chains for each version.

You should add tests for:

  • the PCH / Modules case (particularly regarding what happens when merging multiple copies of the same set of functions, for instance when they're defined inline); I would expect this doesn't currently work with this patch, because we only ever load one function body per redeclaration chain
  • multiversioned functions declared inline
  • constant expression evaluation of calls to such functions (which should -- presumably -- either always use the default version or be rejected as non-constant)

I would also expect that asking a multiversioned FunctionDecl for its definition would only ever give you the target("default") definition or the one-and-only definition, rather than an arbitrary declaration that happens to have a body.

lib/Basic/Targets/X86.cpp
1335

I don't see why this list would need to match the list in some GCC source file. What this list has to match is the set of targets we wish to support, which (for compatibility with GCC) should contain all those that GCC also supports. And it sounds like there are ordering constraints implied by the contents of this list. That's what you should be talking about here.

1336

Where did this table and code actually come from?

1337

Deducing std::initializer_list is an abomination. Use const char *TargetArray[] or similar here instead.

1399

Can you replace this list (and the repetition of it elsewhere in this file) with a .def file carrying the same information?

lib/CodeGen/CodeGenModule.cpp
421–423

I'm not happy with this diagnostic being produced by IR generation. We should diagnose this at the point of first use of the multiversioned function, within Sema.

2134

CodeGen should not be inventing mangled names with external linkage. Please move this to the name mangler. Is this mangling scheme compatible with GCC? How will we avoid name collisions between this and other uses of dot suffixes?

test/CodeGenCXX/attr-target-multiversion.cpp
22

This looks like it has the wrong linkage.

Hi Richard, thanks for the review! I'll make another attempt with this based on your feedback, though if you could clarify your suggestion, it would be greatly appreciated.

I'm not entirely happy with the AST representation you're using here. Allowing multiple declarations of the same entity to have (semantically distinct) bodies breaks our AST invariants, and will cause things like our PCH / modules support to fail. This can probably be made to work, but I'd like to see what that looks like before we commit to modeling a multiversioned function this way rather than, say, with different redeclaration chains for each version.

I'd experimented with (but not succeeded with) simply preventing merging of the decls, but had a tough time keeping like-versions together for the purposes of the iFunc. Additionally, clarifying the 'call' later on became quite difficult. I can definitely give that another try though.

You should add tests for:

  • the PCH / Modules case (particularly regarding what happens when merging multiple copies of the same set of functions, for instance when they're defined inline); I would expect this doesn't currently work with this patch, because we only ever load one function body per redeclaration chain

I'm not sure I know what one of those looks like... is there a good test that I can use to learn the modules implementation?

  • multiversioned functions declared inline

I don't see anything changing for this situation, but I imagine you have an issue in mind that I'm missing?

  • constant expression evaluation of calls to such functions (which should -- presumably -- either always use the default version or be rejected as non-constant)

I'd say they should always do the default version, but this isn't something I did any work on. I'll look at that.

I would also expect that asking a multiversioned FunctionDecl for its definition would only ever give you the target("default") definition or the one-and-only definition, rather than an arbitrary declaration that happens to have a body.

Right, this is a big catch that I didn't think about...

lib/Basic/Targets/X86.cpp
1336

I took the list of things recognized by GCC, ran them through godbolt to see the list.

1399

I've been working with Craig trying to do something like this, though not with a .def file. So far I've yet to come up with something signifiantly better, but I'm still working on it.

lib/CodeGen/CodeGenModule.cpp
421–423

Unfortunately, there is no real way to tell there. A usage could happen before the 2nd definition, so it wouldn't know that it IS a multiversioning at that point.

2134

I can definitely extract this over to the name manglers. This mangling scheme is exactly what GCC and ICC do currently. I have no idea how it avoids collisions besides not being able to define the root name with another mechanism.

rsmith added a comment.Nov 1 2017, 5:26 PM

You should add tests for:

  • the PCH / Modules case (particularly regarding what happens when merging multiple copies of the same set of functions, for instance when they're defined inline); I would expect this doesn't currently work with this patch, because we only ever load one function body per redeclaration chain

I'm not sure I know what one of those looks like... is there a good test that I can use to learn the modules implementation?

There are a bunch in test/PCH and test/Modules that you could crib from.

  • multiversioned functions declared inline

I don't see anything changing for this situation, but I imagine you have an issue in mind that I'm missing?

You need to make sure you use the correct linkage for the versions and for the dispatcher. Your tests seem to demonstrate you're getting this wrong at the moment. Also, this is the case where the mangling of the target-specific definitions would become part of the ABI.

  • constant expression evaluation of calls to such functions (which should -- presumably -- either always use the default version or be rejected as non-constant)

I'd say they should always do the default version, but this isn't something I did any work on. I'll look at that.

I would also expect that asking a multiversioned FunctionDecl for its definition would only ever give you the target("default") definition or the one-and-only definition, rather than an arbitrary declaration that happens to have a body.

Right, this is a big catch that I didn't think about...

What I'm essentially thinking here is that a declaration with a body for a multiversioned function, that isn't for the "default" target, would not be treated as being "the definition" at all.

The big pain point here is that we don't know whether a function is multiversioned until we see the *second* version of it (or a declaration with no target attribute or a declaration with a "default" target), and we generally want our AST to be immutable-once-created. I don't see a good way to handle this except by mutating the AST when we see the second version (which leads to complexity, particularly around PCH -- you need to inform AST mutation listeners that this has happened and write a record out into the AST file to indicate you've updated an imported entity) -- or by treating any function with a target attribute as if it were multiversioned, and not treating a declaration with a body as being the definition until / unless we see the first use of it (which, again, leads to an AST mutation of some kind at that point).

Keeping the redeclaration chains separate would avoid this problem, but we'd still need to do *something* to track the set of versions of the function so we can emit them as part of the ifunc.

rsmith added inline comments.Nov 1 2017, 5:32 PM
lib/Basic/Targets/X86.cpp
1336

OK, that sounds reasonable. Please rephrase this comment; the current wording could be read as suggesting this is in some way derived from the GCC source code, which is clearly not what you did, nor what we're suggesting that people updating this list do.

lib/CodeGen/CodeGenModule.cpp
421–423

I think we should reject any case where new versions of the function are added after the first use. Generally we want our extensions to support eager code generation, even in cases where that means we reject certain setups that GCC copes with.

The big pain point here is that we don't know whether a function is multiversioned until we see the *second* version of it (or a declaration with no target attribute or a declaration with a "default" target), and we generally want our AST to be immutable-once-created. I don't see a good way to handle this except by mutating the AST when we see the second version (which leads to complexity, particularly around PCH -- you need to inform AST mutation listeners that this has happened and write a record out into the AST file to indicate you've updated an imported entity) -- or by treating any function with a target attribute as if it were multiversioned, and not treating a declaration with a body as being the definition until / unless we see the first use of it (which, again, leads to an AST mutation of some kind at that point).

Keeping the redeclaration chains separate would avoid this problem, but we'd still need to do *something* to track the set of versions of the function so we can emit them as part of the ifunc.

That is definitely a pain point. The solution I've had bouncing around in my head is the following, but it has some limitations that GCC doesn't have:
1- Convert to multiversion not permitted unless 1 of the 2 is 'default' (since it is required).
2- Convert to multiversion not permitted if the decl has been used.
3- Prevent the 'lookup' from finding the non-default version, this permits things like constexpr to work properly.
4- Add a list of the "Multiversion Decls" to the 'default' FunctionDecl.

My hope being that #2 makes it so that no one would really NOTICE the AST rewriting, since it hasn't been used.
#1 enables #3, so that the only 'user' of the multiversion'ed decls ends up being the CodeGen emitting the ifunc.
#4 of course is to give CodeGen the ability to emit all of the definitions.

Is there a better way to 'downgrade' the non-default FunctionDecl's to 'not lookupable'? Or is this something I would need to add?

Finally, 1 further idea is that "default" automatically creates it as a 'multiversion case'. Since a sole 'default' would be invalid in GCC, I think that makes us OK, right?

Thoughts on the above proposal? I have basically abandoned this review, since the changes are pretty intense. I've also submitted another patch (that Craig Topper is helping me with) to cleanup the CPU/Feature definitions to make that list more sane.

BTW: Just saw your 2nd response: I think the rejection of cases where multiversioning is caused after first-use is the only real way to make this sane.

Thanks again for your input!

erichkeane abandoned this revision.Dec 4 2017, 4:47 PM

Completely superceededby this patch here: https://reviews.llvm.org/D40819

Since that one is a complete redesign/reimplementation, I'd prefer to do it separately.