This is an archive of the discontinued LLVM Phabricator instance.

Implement Function Multiversioning for Non-ELF Systems.
ClosedPublic

Authored by erichkeane on Oct 23 2018, 9:36 AM.

Details

Summary

Similar to how ICC handles CPU-Dispatch on Windows, this patch uses the
resolver function directly to forward the call to the proper function.
This is not nearly as efficient as IFuncs of course, but is still quite
useful for large functions specifically developed for certain
processors.

This is unfortunately still limited to x86, since it depends on
builtin_cpu_supports and builtin_cpu_is, which are x86 builtins.

The naming for the resolver/forwarding function for cpu-dispatch was
taken from ICC's implementation, which uses the unmodified name for this
(no mangling additions). This is possible, since cpu-dispatch uses '.A'
for the 'default' version.

In 'target' multiversioning, this function keeps the '.resolver'
extension in order to keep the default function keeping the default
mangling.

Diff Detail

Repository
rL LLVM

Event Timeline

erichkeane created this revision.Oct 23 2018, 9:36 AM
erichkeane added a subscriber: mibintc.
rnk added a comment.Oct 23 2018, 4:20 PM

Seems reasonable. Should the resolver still be called ?foo@.resolver, or should it get a new name, since it is quite functionally different now?

lib/CodeGen/CodeGenFunction.cpp
2381 ↗(On Diff #170681)

Rather than templating over IRBuilder and CGBuilderTy, I'd standardize on CGBuilderTy.

2391–2393 ↗(On Diff #170681)

Surely this would be simpler as a range for or Args{Resolver->arg_begin() ...};

2395 ↗(On Diff #170681)

This approach is... not going to work in the general case. Consider, for example, varargs. Then consider 32-bit x86 inalloca, which is quite widespread. Any non-trivially copyable object is not going to be passable through such a thunk. You might consider looking at CodeGenFunction::EmitMustTailThunk instead.

2432 ↗(On Diff #170681)

You can make this a CGBuilderTy. Just pass it a type cache.

test/CodeGenCXX/attr-target-mv-overloads.cpp
74 ↗(On Diff #170681)

Does dumpbin demangle these names correctly? Just curious.

erichkeane marked 2 inline comments as done.Oct 24 2018, 6:17 AM
In D53586#1273546, @rnk wrote:

Seems reasonable. Should the resolver still be called ?foo@.resolver, or should it get a new name, since it is quite functionally different now?

I'm not attached to the name. I didn't have a good alternative (I'd thought of 'dispatcher' at one point, but figured that was because my mind was attached to cpu-dispatch). Implementation wise, '.resolver' is slightly easier, but not enough to motivate the discussion.

If you like '.dispatcher' instead of '.resolver', I can do that, otherwise a brainstorming session would be appreciated!

lib/CodeGen/CodeGenFunction.cpp
2391–2393 ↗(On Diff #170681)

Surely... Unfortunately the types don't match. Resolver Args are an array of Arguments (so the iterator is a Arg*).

HOWEVER, "Args" here needs to be an array ref to Value*, so the iterator would have to be a Value** (not a Arg*/Value*). The point of this small vector is to create an array of _pointers_ from the array of Arguments.

2395 ↗(On Diff #170681)

Oh dear...
I'm unfamiliar with EmitMustTailThunk, is it self explanatory? Any chance you can expound? Should I call/use that function, or copy out of it?

2432 ↗(On Diff #170681)

I'd tried that a while and could never find a type cache to use. I'd not realized until just now (searching more for usages) that CodeGenFunction is one!

test/CodeGenCXX/attr-target-mv-overloads.cpp
74 ↗(On Diff #170681)

I don't know, I'd suspect not. However, demangler.com (not sure what it uses?) seems to demangle them ignoring everything after the '.'.

erichkeane marked an inline comment as done.Oct 24 2018, 8:26 AM

@rnk: I looked into the EmitMustTailThunk code, and don't terribly see what I'm doing wrong. I clearly cannot CALL that function (since this works with non-CXXMethodDecls), so I looked into using it. However, I saw that it does a couple of small things that seem useful, but not anything that seems to fix your concerns.

Can you espouse a bit for me? I'm likely missing something, but am out of ideas on what it could be.

lib/CodeGen/CodeGenFunction.cpp
2395 ↗(On Diff #170681)

I looked through that function, and it seems to do very similar things to this...

It FIRST does the small-vector conversion to a value* array like I do.

Second, it 'adjusts' the 'this' parameter. This doesn't seem like it needs to happen, because the type isn't changing, right? my 'this' pointer is still pass-on-able.

Third, it creates the call, then marks it TCK_MustTail.

Forth, it sets the attributes/calling convention of the 'call' object, but I'd expect that to come from the llvm::Function object in the 'CreateCall', right? I can add them after the fact I guess, if we see value.*(see below)

Finally, it does the CreateRetVoid/CreateRet like this function does below.

*I DO have a test that claims that my IR is broken if I set the call attributes. Curiously only 1 of my tests, but the error is:
"cannot guarantee tail call due to mismatched ABI impacting function attributes"

The only difference in attributes is target-features and target-cpu though, so I don't know what causes this.

rnk added inline comments.Oct 24 2018, 2:27 PM
lib/CodeGen/CodeGenFunction.cpp
2391–2393 ↗(On Diff #170681)

Right. :)

2395 ↗(On Diff #170681)

Right, the this-adjustment is definitely unnecessary for this kind of CPU dispatch thunk.

The important thing is that marking the call as musttail ensures that you get perfect forwarding of all arguments, even varargs and inalloca. Maybe it's enough to just mark the call you are creating that way. I think the code you are generating obeys all the musttail invariants, so that might be best.

The "cannot guarantee tail call due to mismatched ABI impacting function attributes" verifier check is only supposed to fire when parameter attributes like byval and inreg differ between the call site and the dispatcher function. It shouldn't care about target-cpu or target-features.

test/CodeGenCXX/attr-target-mv-overloads.cpp
74 ↗(On Diff #170681)

Neat.

did everything suggested by @rnk as far as I know. Thanks again for the reviews!

echristo accepted this revision.Oct 24 2018, 2:54 PM

All of the target specific stuff looks fine to me. I'm going to defer to rnk about the windows side of things and aaron for the attributes.

This revision is now accepted and ready to land.Oct 24 2018, 2:54 PM
rnk added inline comments.Oct 24 2018, 4:43 PM
lib/CodeGen/CodeGenFunction.cpp
2395 ↗(On Diff #170681)

It's probably worth adding a test that uses inalloca, since that's why we're doing this nonsense. It would look like:

struct Foo {
  Foo();
  Foo(const Foo &o);
  ~Foo();
  int x;
};
int __attribute__((target("default"))) bar(Foo o) { return o.x; }
int __attribute__((target("sse4.2"))) bar(Foo o) { return o.x + 1; }
int __attribute__((target("arch=ivybridge"))) bar(Foo o) { return o.x + 2; }

Targetting i686-windows-msvc.

erichkeane marked an inline comment as done.

Added test as requested by @rnk. How's it look? I hope I got the balance of check-lines right.

ACTUALLY add the test this time :/ sorry about that!

rnk accepted this revision.Oct 25 2018, 11:21 AM

Here's a thought. What happens if different TUs observe different overload sets, perhaps because of ifdefs? Different TUs will generate different resolvers, but they won't dispatch to the same sets of targets. I'm guessing we'd treat that as an ODR violation, no diagnostic required?

Anyway, I think this is ready to commit. Thanks!

test/CodeGen/attr-target-mv-va-args.c
41–44 ↗(On Diff #171118)

Probably worth matching out the musttail for variadics, since the ellipsis doesn't make sense without it.

test/CodeGenCXX/attr-target-mv-inalloca.cpp
45–47 ↗(On Diff #171118)

I think it's worth matching the retval and doing CHECK-NEXT for the ret, since that will check that the call is in fact in tail position, which is a verifier requirement for musttail.

In D53586#1276198, @rnk wrote:

Here's a thought. What happens if different TUs observe different overload sets, perhaps because of ifdefs? Different TUs will generate different resolvers, but they won't dispatch to the same sets of targets. I'm guessing we'd treat that as an ODR violation, no diagnostic required?

Anyway, I think this is ready to commit. Thanks!

Yep, pretty much. Its a bit of a weakness of the 'target' mv (since dispatch requires you to explicitly list), but I believe it is a trade off that gcc felt was acceptable.x`x`

This revision was automatically updated to reflect the committed changes.