This is an archive of the discontinued LLVM Phabricator instance.

Implement cpu_dispatch/cpu_specific Multiversioning
ClosedPublic

Authored by erichkeane on May 29 2018, 6:29 AM.

Details

Summary

As documented here: https://software.intel.com/en-us/node/682969 and
https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning
is an ICC feature that provides for function multiversioning.

This feature is implemented with two attributes: First, cpu_specific,
which specifies the individual function versions. Second, cpu_dispatch,
which specifies the location of the resolver function and the list of
resolvable functions.

This is valuable since it provides a mechanism where the resolver's TU
can be specified in one location, and the individual implementions
each in their own translation units.

The goal of this patch is to be source-compatible with ICC, so this
implementation diverges from the ICC implementation in a few ways:
1- Linux x86/64 only: This implementation uses ifuncs in order to
properly dispatch functions. This is is a valuable performance benefit
over the ICC implementation. A future patch will be provided to enable
this feature on Windows, but it will obviously more closely fit ICC's
implementation.
2- CPU Identification functions: ICC uses a set of custom functions to identify
the feature list of the host processor. This patch uses the cpu_supports
functionality in order to better align with 'target' multiversioning.
1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function
marked cpu_dispatch be an empty definition. This patch supports that as well,
however declarations are also permitted, since the linker will solve the
issue of multiple emissions.

Diff Detail

Repository
rC Clang

Event Timeline

erichkeane created this revision.May 29 2018, 6:29 AM
erichkeane added a subscriber: rsmith.
erichkeane marked 2 inline comments as done.May 29 2018, 1:57 PM

Woops, looks like I lost those in the rebase. Thanks for catching them!

craig.topper added inline comments.May 29 2018, 2:29 PM
include/clang/Basic/X86Target.def
295

Could we just make the features a comma separated string? Then we wouldn't need a third version of EmitX86CpuSupports? Yeah it would incur string processing costs, but is that a big deal?

lib/Sema/SemaDecl.cpp
9286

No need for else after return.

lib/Sema/SemaDeclAttr.cpp
1877

Maybe use llvm::any_of since you don't actually care about the iterator returned.

aaron.ballman added inline comments.May 30 2018, 5:43 AM
include/clang/AST/Decl.h
2212–2213

Pedantic nit: CPU instead of Cpu?

include/clang/Basic/Attr.td
850

Does GCC really support this spelling? I couldn't find documentation to suggest it, and a quick test suggests it's not supported there. Perhaps this should use the Clang spelling instead?

851

Be sure to add a test for using this attribute with the C++ spelling, as I'm not certain how well we would parse something like [[gnu::cpu_specific(ivybridge)]] currently (it may just work, however).

Also, why an identifier instead of a string literal?

864

Same here.

include/clang/Basic/AttrDocs.td
225–226

How empty is empty? e.g., is it lexically empty (no tokens whatsoever) or notionally empty (empty after preprocessing)?

__attribute__((cpu_dispatch(atom))) void multi_cpu_1(void) {
  /* Is this empty enough? */
}

__attribute__((cpu_dispatch(atom))) void multi_cpu_2(void) {
  #if 0
    #error How about this?
  #endif
}
lib/Parse/ParseDecl.cpp
209

This comment seems to be in the wrong place.

lib/Sema/SemaDeclAttr.cpp
1853–1854

Better to just use cast<FunctionDecl>(D) here as it already asserts properly.

utils/TableGen/ClangAttrEmitter.cpp
2130

Please don't use Attr as a local name; it's a type.

erichkeane marked 11 inline comments as done.

Fix based on @craig.topper and @aaron.ballman s comments.

Responding to comments :)

include/clang/Basic/Attr.td
850

Yep, you're right. I'd forgotten about that differentiation here. Thanks!

851

I'll add it, presumably as 'clang::cpu_specific'. The decision for string-literal vs identifier was made quite a few years before I was here sadly. I believe the EDG FE doesn't make identifiers any more difficult so the implementer here chose to make it that way.

In this case, I'd very much like to keep it with the same implementation as ICC, simply because users of this are already familiar with it in this form.

include/clang/Basic/AttrDocs.td
225–226

Well, ANY are permissible (even non-empty bodies...) since I issue a warning if I detect that it is not empty. I detect it at the AST level, so anything that doesn't add to the AST isn't counted against 'empty'. In this case, those two are both empty.

Do you have a suggestion on how I can word this more clearly?

aaron.ballman added inline comments.May 31 2018, 4:54 AM
include/clang/Basic/Attr.td
851

In this case, I'd very much like to keep it with the same implementation as ICC, simply because users of this are already familiar with it in this form.

The compatibility with ICC is important for the GNU-style attribute, but for the C++ spelling this is novel territory where there is no compatibility story. Another approach to consider is whether to accept identifiers or string literals depending on the spelling, but that might not be worth it.

include/clang/Basic/AttrDocs.td
225–226

Would this be accurate?

Functions marked with ``cpu_dispatch`` are not expected to be defined, only declared. If such a marked function has a definition, any side effects of the function are ignored; trivial function bodies are permissible for ICC compatibility.

(Question: if this is true, should you also update FunctionDecl::hasTrivialBody() to return true for functions with this attribute?)

erichkeane added inline comments.May 31 2018, 6:09 AM
include/clang/Basic/Attr.td
851

I'd like to think about that... I could forsee accepting BOTH forms, simply because it would slightly simplify the conversion from an attribute-target-mv situation, though I'm not sure it is important enough to do.

include/clang/Basic/AttrDocs.td
225–226

That is accurate and elegantly describes the behavior. I don't think it makes sense to mark the body 'trivial' since it actually WILL have a body, its just emitted based on the attribute rather than the user-defined body. I'd also fear some of the side-effects of permitting someone to detect it as 'trivial' for SFINAE purposes.

Shamelessly stealing @aaron.ballman s AttrDocs description of cpu_dispatch bodies.

Just a quick bump, I know most of you are still recovering from the Switzerland trip, but hopefully the others will have some time.

Some drive-by nits.
General observation: this is kinda large.
I would personally split split off the codegen part into a second patch.

include/clang/AST/Decl.h
2212–2213

Everything else has comments, and i'm not sure it is obvious what those mean without knowing the context.

include/clang/Basic/AttrDocs.td
223

s/compilation/source code/?

include/clang/Basic/X86Target.def
288

This is going to go stale.
It already does not have znver1, btver2.
Surely this info already exists elsewhere in llvm?
Can it be deduplicated somehow?

lib/CodeGen/CodeGenFunction.cpp
2426

Those are arbitrary names, right?
Maybe somehow convey that this is CPUDispatchMultiVersionResolver logic in the names?

lib/CodeGen/CodeGenModule.cpp
867

I think

return Twine(".", Target.CPUSpecificManglingCharacter(Name)).str();
erichkeane marked 4 inline comments as done.Jul 9 2018, 8:02 AM

Some drive-by nits.
General observation: this is kinda large.
I would personally split split off the codegen part into a second patch.

Thanks for the review! I definitely agree this is a sizable patch, I'm not sure the non-codegen part has any value without the codegen so I'm not sure that splitting it would provide value. I can definitely do so if reviewers in general believe this is the right tack.

include/clang/Basic/X86Target.def
288

You'll note that this doesn't have any non-intel names :) This is non-Intel processor friendly version of the ICC implementation of the same feature. Because of this, I don't have a good feature list for those.

I'd searched in a few places for an alternate source for these names/feature lists. I was unable to find anywhere else that needs/has similar information, though any alternate sources are greatly appreciated.

I'll note that the name and mangling-name aren't going to be able to be removed, though perhaps there is a source of these features somewhere?

lib/CodeGen/CodeGenModule.cpp
867

A twine of char + a char is actually pretty challenging... I rewrote it without the intermediate string though. I am generally unconcerned, since 2 chars is very much in the 'short string optimization' range of all implementations.

erichkeane updated this revision to Diff 154610.Jul 9 2018, 8:04 AM
erichkeane marked an inline comment as done.

@lebedev.ri 's comments, plus a rebase.

Bump! Would love to have someone take a look!

aaron.ballman added inline comments.Jul 18 2018, 6:09 AM
include/clang/AST/Decl.h
2212–2213

Thoughts on isCPUDispatchMultiVersion() instead of isCpuDispatchMultiVersion()?

include/clang/Basic/Attr.td
851

I'm okay with the current approach that uses identifiers only. We can relax the rule to allow string literals if it turns out there is user demand for such a thing.

851

Cpus -> CPUs ?

865

Cpus -> CPUs ?

include/clang/Basic/AttrDocs.td
247

I'd drop the bit about "which is the condition-less implementation". It reads a bit oddly to begin with and doesn't really add much to the explanation.

lib/CodeGen/CodeGenModule.cpp
866

Don't use auto here.

2446

cast<> already asserts this.

2457

Can use auto * here.

2464

const IdentifierInfo *

lib/Sema/Sema.cpp
1734

const Expr *?

lib/Sema/SemaDecl.cpp
9657

Spurious newline? Also, else after a return.

9684

s/if The/If the

9710

Spurious newline.

9780

else after return.

lib/Sema/SemaOverload.cpp
9023

If there's no mismatch, doesn't this wind up dereferencing the end iterator of the range?

erichkeane marked 13 inline comments as done.

All of @aaron.ballman s comments have been dealt with except for Cpus->CPUs in the Attr.td file for explained reasons. Still negotiable :)

include/clang/AST/Decl.h
2212–2213

Well, I can definitely switch it back if CPU is your preference. Looking at it, we're probably about 75/25 in our codebase preferring CPU, so I prefer it as well.

include/clang/Basic/Attr.td
851

So, I agree with you we should be consistent with spelling. However, in this case, I'd prefer not changing this one. This is what the generated code in Attrs.inc looks like if I change it:

typedef IdentifierInfo ** cPUs_iterator;
cPUs_iterator cPUs_begin() const { return cPUs_; }
cPUs_iterator cPUs_end() const { return cPUs_ + cPUs_Size; }
unsigned cPUs_size() const { return cPUs_Size; }
llvm::iterator_range<cPUs_iterator> cPUs() const { return llvm::make_range(cPUs_begin(), cPUs_end()); }

I think having "Cpus" in 2 places is way better than having to spell it as cPUs_begin. Your thoughts?

lib/Sema/SemaOverload.cpp
9023

Yikes, you're right! However, that would require 2 declarations to have identical cpu_specific lists, which is against the rules previously enforced. I've put an assert in here to ensure that never happens.

aaron.ballman added inline comments.Jul 18 2018, 11:54 AM
include/clang/Basic/Attr.td
851

Oh wow that is disgusting. Yes, please leave as Cpus here.

erichkeane added inline comments.Jul 18 2018, 12:58 PM
include/clang/Basic/Attr.td
851

:-D Already Done!

aaron.ballman accepted this revision.Jul 18 2018, 1:20 PM

Aside from an assert than can be removed, this LGTM on the attribute side of things.

lib/CodeGen/CodeGenModule.cpp
2446

I don't think this is done -- the assert can be removed.

This revision is now accepted and ready to land.Jul 18 2018, 1:20 PM
This revision was automatically updated to reflect the committed changes.