This is an archive of the discontinued LLVM Phabricator instance.

Implement target_clones multiversioning
ClosedPublic

Authored by erichkeane on Sep 4 2018, 11:41 AM.

Details

Summary

As discussed here: https://lwn.net/Articles/691932/

GCC6.0 adds target_clones multiversioning. This functionality is
an odd cross between the cpu_dispatch and 'target' MV, but is compatible
with neither.

This attribute allows you to list all options, then emits a separately
optimized version of each function per-option (similar to the
cpu_specific attribute). It automatically generates a resolver, just
like the other two.

The mangling however, is... ODD to say the least. The mangling format is:
<normal_mangling>.<option string>.<option ordinal>.

However, the 'option ordinal' is where it gets strange. When parsing the
list, 'default' is moved to the end, so "foo,default,bar", foo is 0, bar is 1,
and default is 2.

Otherwise, emission rules should be the same as 'target'.

Diff Detail

Event Timeline

erichkeane created this revision.Sep 4 2018, 11:41 AM

Thank you for this! I have some cursory review comments, and possibly more later.

include/clang/Basic/AttrDocs.td
1600 ↗(On Diff #163880)

It supports the C++ style as well, so perhaps "Clang supports the `target_clones("OPTIONS")` attribute." instead?

1601 ↗(On Diff #163880)

It can be on a declaration too, can't it?

1602 ↗(On Diff #163880)

Extraneous space between "versions of"

1607 ↗(On Diff #163880)

You should probably explain the options a bit more thoroughly around here; nothing explains that the options are architectures, for instance.

1612 ↗(On Diff #163880)

typo: disregarding

lib/CodeGen/CodeGenModule.cpp
2394 ↗(On Diff #163880)

Spurious newline?

2559 ↗(On Diff #163880)

const auto *

lib/Sema/SemaDecl.cpp
9428 ↗(On Diff #163880)

Rather than repeat this expression in multiple places, I prefer a named local variable.

lib/Sema/SemaDeclAttr.cpp
3014 ↗(On Diff #163880)

remove -> Remove

erichkeane marked 9 inline comments as done.

fix aaron's comments.

dblaikie removed a reviewer: echristo.
dblaikie added a subscriber: echristo.

Based on feedback from @rsmith I've been trying to get this to work with Lambdas like GCC does, though I'm quite confused about how to go about it here. It seems that the lambda definition call operator is deferred, however I'm having a hard time making sure that this happens in the "IFunc" case. I'm still working on it, but it might be a bit.

rsmith added inline comments.Sep 10 2018, 11:14 AM
include/clang/Basic/Attr.td
2031–2042 ↗(On Diff #164251)

Sorry, I don't think it's acceptable from a design perspective to have mutable state in an Attr object, even if you can ensure that only one client of the Attr will be using the state at a time. CodeGen is going to need to track its own index into the attribute's list of clones.

include/clang/Basic/AttrDocs.td
1619–1621 ↗(On Diff #164251)

Can we warn on attributes that contain multiple strings where one or more such strings contains a comma? That seems like it would always be user error.

I think I'd then prefer to document this as: "The versions can either be listed as a comma-separated sequence of string literals or as a single string literal containing a comma-separated list of versions. For compatibility with GCC, the two formats can be mixed. For example, the following will emit 4 versions of the function:"

lib/CodeGen/CodeGenModule.cpp
961–970 ↗(On Diff #164251)

This chain of ifs and similar things elsewhere make me think we're missing an abstraction. Perhaps FunctionDecl should have a getMultiVersionAttr() and/or getMultiVersionKind() functions?

1043 ↗(On Diff #164251)

This target index should be part of the GlobalDecl, not tracked alongside it, because it identifies which global we're talking about.

lib/Sema/SemaDecl.cpp
9800–9807 ↗(On Diff #164251)

We should also disallow multiple TargetClonesAttrs on the same declaration. GCC allows this and then ignores all but the last such attribute; we can do better.

9811–9812 ↗(On Diff #164251)

Why can't main be multiversioned? That seems like an arbitrary restriction.

9839 ↗(On Diff #164251)

I would expect the plain English term to not have interior capitalization.

lib/Sema/SemaDeclAttr.cpp
3011 ↗(On Diff #164251)

Should be something like checkTargetClonesAttrStr to indicate that this is checking one specific string, not the entire attribute.

3018–3029 ↗(On Diff #164251)

When looping through comma-delimited strings, it's easier to use StringRef::split:

std::pair<StringRef, StringRef> Parts = {{}, Str};
while (!Parts.second.empty()) {
  Parts = Parts.second.split(',');
  StringRef Cur = Parts.first.trim();
  // ...
}
3035–3043 ↗(On Diff #164251)

Rather than just repeating the problematic portion of the string textually in the diagnostic, pass the StringLiteral into here and use StringLiteral::getLocationOfByte to highlight the problematic source range within the string.

3068–3071 ↗(On Diff #164251)

Do we need any other validation here? What if there are duplicate versions?

test/CodeGen/attr-cpuspecific.c
14–15 ↗(On Diff #164251)

Should this really have external linkage? These . suffixes are reserved for vendor-specific manglings that typically should only be used for symbols with internal linkage.

If you really want to give these symbols external linkage, I think they should at least be put in a COMDAT keyed off the primary symbol so that we don't get a mishmash of different suffix combinations from different compilers.

Thanks for the feedback @rsmith! I'm working through the lambda issue and a few other things, but will get to this as soon as I can.

include/clang/Basic/Attr.td
2031–2042 ↗(On Diff #164251)

Alright, I'll see if I can figure that out. I should probably do something similar for the cpu-dispatch, since it actually uses a very similar mechanism.

include/clang/Basic/AttrDocs.td
1619–1621 ↗(On Diff #164251)

I don't see why not, the warning should be a fairly trivial addition.

lib/CodeGen/CodeGenModule.cpp
961–970 ↗(On Diff #164251)

I'm not super sure what either buys us... The multiversion attributes are all somewhat different unfortunately, so they would need to be dispatched separately later. The 'getMultiVersionKind' is perhaps useful, though its essentially what isXXXMultiVersion does. I'll think on it, I agree that there is likely an abstraction somewhere between that can improve this...

1043 ↗(On Diff #164251)

I see. I can definitely do that, I was concerned that adding an int to the GlobalDecl would be an unacceptable increase in size.

lib/Sema/SemaDecl.cpp
9811–9812 ↗(On Diff #164251)

At the time of implementing 'target', I was unsure of (and still am) how to accomplish this. It would seem that I'd need to make the entry point a wrapper that calls the ifunc. GCC seems to improperly call this (it doesn't emit the 'main' fucntion as far as I can tell).

lib/Sema/SemaDeclAttr.cpp
3068–3071 ↗(On Diff #164251)

GCC has this annoying feature of just IGNORING duplicate versions. I likely could/should warn about this, but for GCC compat we probably want to support this.

test/CodeGen/attr-cpuspecific.c
14–15 ↗(On Diff #164251)

I guess I don't really have a good reason to not just pick up the linkage from the original. I'll check to see if I can figure out how I got external linkage in the first place.

rsmith added inline comments.Sep 10 2018, 1:32 PM
lib/CodeGen/CodeGenModule.cpp
961–970 ↗(On Diff #164251)

The idea would be that we have an enum that we can perform a covered switch over, so we don't need to remember to update all the relevant places when we add another kind of multiversioning. You already have the code to work out the kind of multiversioning; moving it from SemaDecl.cpp to FunctionDecl then using it where it makes sense in this patch would help, I think.

lib/Sema/SemaDecl.cpp
9811–9812 ↗(On Diff #164251)

OK, if there's actually a problem with having the main symbol be an ifunc, that seems like a perfectly legitimate reason for the restriction.

erichkeane added a subscriber: grooverdan.

Fix @rsmith s comments, rebase on the big CPUDispatch refactor.

Fix @rsmith s comments, rebase on the big CPUDispatch refactor.

Ping. What's the status here?

Fix @rsmith s comments, rebase on the big CPUDispatch refactor.

Ping. What's the status here?

I haven't looked at it since then :) I thought it was about ready, and just needed review but this never got sufficient attention to get review. It likely needs a rebase (and I'm not sure what other changes have been made in the meantime).

Since there seems to be at least a little renewed interest in this from a handful of people lately, I spend the time to rebase this and do some minor cleanup tasks.

Since there seems to be at least a little renewed interest in this from a handful of people lately, I spend the time to rebase this and do some minor cleanup tasks.

FWIW yes, i'm being asked about this periodically (https://github.com/darktable-org/darktable/pull/7075, https://github.com/darktable-org/darktable/pull/7337#discussion_r544925151, etc)
It would be pretty good to be compatible with GCC in this.

ilinpv added a subscriber: ilinpv.Oct 6 2021, 3:42 PM
danielkiss added a comment.EditedOct 29 2021, 2:59 AM

Hi @erichkeane,

At Arm we are going to add the multiversioning support for Arm targets[1][2]. It would be nice to land this change because we could build top of it.
Please let me know how can I help.

[1] https://github.com/ARM-software/acle/pull/21
[2] https://bugs.llvm.org/show_bug.cgi?id=50400

Hi @erichkeane,

At Arm we are going to add the multiversioning support for Arm targets[1][2]. It would be nice to land this change because we could build top of it.
Please let me know how can I help.

[1] https://github.com/ARM-software/acle/pull/21
[2] https://bugs.llvm.org/show_bug.cgi?id=50400

Hi Daniel-
This patch initially required that I refactor some stuff with the existing two multiversioning mechanisms (cpu-specific and target) which happened. At a point after that, @lebedev.ri asked for me to rebase it, and I did. However, it never gained attention from reviewers, and life got in the way :)

This needs rebasing/probably some cleanup, plus code review. If you have active/proficient CFE code reviewers who could take a look, I'd be willing to spend some time rebasing.

danielkiss added a comment.EditedNov 5 2021, 4:30 AM

This needs rebasing/probably some cleanup, plus code review. If you have active/proficient CFE code reviewers who could take a look, I'd be willing to spend some time rebasing.

Thank you, I will help with the review and ask around here for more eyes.

erichkeane updated this revision to Diff 385089.Nov 5 2021, 8:45 AM

Another rebase, as requested. I am not particularly familiar with this code anymore, so my responses to reviews might not be particularly well informed, but I'm hoping that rust shakes off through the review :)

FWIW I'm a bit rusty in this area myself, but thanks for doing this. Let's see if we can't get Aaron to continue reviewing :)

Sadly, I think _I_ am the multiversioning expert (or at least, past-me was), so I'm hoping some of the reviewers @danielkiss can get to join will be able to read/understand this stuff for a quality review.

I think you are these days too :) My offer was "past-past-me". I think
you're probably ok here, I did a rough scan, but getting someone like Aaron
for the attribute support would be good.

Thanks for the rebase!

Looks got to me,

-march may make the default same as one of the clone, in this case maybe we don't need to create two versions of the function. This could be solved later IMHO.

clang/lib/Sema/SemaDecl.cpp
10669
erichkeane marked an inline comment as done.

For the rest of multiversioning we count on the optimizer to remove variants made irrelevant, but I'm not sure opt can do anything like that yet :) But nit made.

danielkiss accepted this revision.Nov 11 2021, 5:48 AM

But nit made.

NIT: clang-format issues still present. Maybe you need to update your local clang-format.

For the rest of multiversioning we count on the optimizer to remove variants made irrelevant, but I'm not sure opt can do anything like that yet :)

I think removing the duplicated in opt is harder than just eliminate them here. At the end maybe the function does not need ifunc.

LGTM

This revision is now accepted and ready to land.Nov 11 2021, 5:48 AM

Overall, I think this LGTM, but I did find a few nits. Can you fix the clang-format issues? Also, I'd like to see some C++ test coverage that shows how this works on template (partial) specializations, lambdas (with GNU-style syntax), and overloaded functions. If we deviate from the GCC behavior in any cases, it'd be great to capture it in comments (unless you think the deviation is intentional, at which point we should document it in AttrDocs.td, or you think the deviation needs to be fixed before we land it).

clang/include/clang/Basic/Attr.td
2706
clang/include/clang/Basic/DiagnosticSemaKinds.td
11299

Should this ad hoc group be listed within the FunctionMultiVersioning group?

clang/lib/Sema/SemaDeclAttr.cpp
1968–1972

This should be handled in Attr.td via a def : MutualExclusions<[....]>; if possible.

3290

Same here.

3360

And here

erichkeane marked 5 inline comments as done.

Did all the things Aaron asked for, but required adding 'lambda not supported yet' logic for this.

Did all the things Aaron asked for, but required adding 'lambda not supported yet' logic for this.

I don't see the .cpp test file, did it get dropped by accident?

Also, pre-commit isn't able to apply the patch, so it'd be good to resolve that to get the CI testing.

clang/lib/Sema/SemaDeclAttr.cpp
1970

This diagnostic behavior is a bit unfortunate because we'll issue a diagnostic like:

// expected-error@+2 {{'target_clones' and 'target_clones' attributes are not compatible}}

Maybe we should add err_duplicate_attribute and steal its text from warn_duplicate_attribute and use getAttr() directly here?

added C++ tests this time, changed how dupes are diagnosed.

erichkeane marked an inline comment as done.Nov 11 2021, 10:27 AM
aaron.ballman accepted this revision.Nov 11 2021, 10:58 AM

LGTM aside from a nit with naming/wording. Feel free to land without additional review (unless you want more review, of course!).

clang/include/clang/Basic/DiagnosticSemaKinds.td
9868–9869

We typically don't use distinct wording for err_foo and warn_foo diagnostics -- folks tend to assume they've got the same wording and the only difference is the severity.

Potential ways forward:

  • Rename to err_duplicate_attribute_exact and use the same wording as the warning version)
  • Leave wording alone, rename to something else (perhaps err_repeated_multiversion_attribute) to make it specific to this usage

I don't insist on a particular path (or a change at all, if you have strong attachment), but I think it'd be nice to fix it before landing.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptNov 11 2021, 11:11 AM
jgorbe added a subscriber: jgorbe.Nov 11 2021, 5:31 PM

This breaks some existing code (we have found this issue building the JPEG XL library at github.com/libjxl). This is a very simplified version of the problem:

#pragma clang attribute push(__attribute__((target("sse2,ssse3"))), apply_to = function)
__attribute__((target("sse2"))) void f() {
}
#pragma clang attribute pop

This used to build before this patch, but now fails with

$ clang -c a.cc
a.cc:1:45: error: attribute 'target' cannot appear more than once on a declaration
#pragma clang attribute push(__attribute__((target("sse2,ssse3"))), apply_to = function)
                                            ^
a.cc:2:1: note: when applied to this declaration
__attribute__((target("sse2"))) void f() {
^
a.cc:2:16: note: conflicting attribute is here
__attribute__((target("sse2"))) void f() {
               ^
1 error generated.

Before this patch, the function-level attribute would win. Here's the relevant part of the generated IR for that example:

; Function Attrs: mustprogress noinline nounwind optnone uwtable
define dso_local void @_Z1fv() #0 {
  ret void
}

attributes #0 = { mustprogress noinline nounwind optnone uwtable "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

Note how there's sse2 (which was in the function-level attribute) but no ssse3 (which wasn't).

Was this semantic change intentional?

Since it is not clear whether the semantic change was intended, I think it makes sense to revert the patch for now. If it is intended, it might be good to mention it in the change description, so that people are warned.

Since it is not clear whether the semantic change was intended, I think it makes sense to revert the patch for now. If it is intended, it might be good to mention it in the change description, so that people are warned.

That looks like an unintended change to me, likely due to the new mutual exclusion checks. Thanks for letting us know!

danielkiss added inline comments.Nov 12 2021, 8:08 AM
clang/lib/Sema/SemaDeclAttr.cpp
3290–3296

This caused the breakage. This check could be dropped for the reland.
target attributes seems can be combined, others shouldn't.

Since it is not clear whether the semantic change was intended, I think it makes sense to revert the patch for now. If it is intended, it might be good to mention it in the change description, so that people are warned.

That looks like an unintended change to me, likely due to the new mutual exclusion checks. Thanks for letting us know!

It was SORTA intended, I was going for 'conservative fix' here, but the 'target' change was more of a 'while I was there' bit. I'm away for the next two weeks, but can re-land this without that plus a test to validate a winner (as that wasn't really the case before IIRC).

That said, the above example gives me some willies with regards to setting up multiversioning.... The interaction of two declarations here, 1 which adds a 'target' to the pragma-push seems fishy/confusing.

skan added a subscriber: skan.Jun 16 2023, 6:23 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 16 2023, 6:23 PM