This is an archive of the discontinued LLVM Phabricator instance.

Utils: Add utility pass to lower ifuncs
ClosedPublic

Authored by arsenm on Dec 1 2022, 5:56 PM.

Details

Summary

Create a global constructor which will initialize a global table of
function pointers. For now, this is only used as a reduction technique
for llvm-reduce.

In the future this may be useful to support ifunc on systems where the
program loader doesn't natively support it.

Diff Detail

Event Timeline

arsenm created this revision.Dec 1 2022, 5:56 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 1 2022, 5:56 PM
arsenm requested review of this revision.Dec 1 2022, 5:56 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 1 2022, 5:56 PM
Herald added a subscriber: wdng. · View Herald Transcript
aeubanks added inline comments.Dec 12 2022, 7:09 PM
llvm/lib/Transforms/Utils/LowerIFunc.cpp
22

this extra function seems unnecessary

23

leftover code?

llvm/lib/Transforms/Utils/ModuleUtils.cpp
437

making this positive is probably less likely to go wrong if some backend assumes non-negative priorities. I think 10 is probably good?

llvm/test/Transforms/LowerIFunc/ifunc-constantexpr.ll
3 ↗(On Diff #479491)

this is a verifier test? and should already be in tree as one

arsenm updated this revision to Diff 482345.Dec 12 2022, 7:37 PM
arsenm marked 4 inline comments as done.

Garbage collect. Also update_test_checks global behavior seems to have slightly changed recently

arsenm added inline comments.Dec 12 2022, 7:37 PM
llvm/test/Transforms/LowerIFunc/ifunc-constantexpr.ll
3 ↗(On Diff #479491)

Leftover from before my verifier patch to reject this

To answer your questions in the comments about what to do about resolvers with arguments: At least glibc always calls ifunc resolvers without any arguments. It just reads the address of the resolver function from the ELF file, casts it to a pointer to a void-returning function with no arguments and calls it.

So, I agree: Resolvers with arguments should not be allowed.

To answer your questions in the comments about what to do about resolvers with arguments: At least glibc always calls ifunc resolvers without any arguments. It just reads the address of the resolver function from the ELF file, casts it to a pointer to a void-returning function with no arguments and calls it.

So, I agree: Resolvers with arguments should not be allowed.

@rjmccall, I'm trying to remember whether Mach-O has an ifunc-like thing that takes an argument.

llvm/lib/Transforms/Utils/ModuleUtils.cpp
388–397

This is where ifuncs with arguments are handled. @rjmccall, thoughts on the FIXME?

I do think the IR for resolvers with arguments should be rejected, but we probably need clang to hard error first: https://github.com/llvm/llvm-project/issues/59167

To answer your questions in the comments about what to do about resolvers with arguments: At least glibc always calls ifunc resolvers without any arguments. It just reads the address of the resolver function from the ELF file, casts it to a pointer to a void-returning function with no arguments and calls it.

So, I agree: Resolvers with arguments should not be allowed.

@rjmccall, I'm trying to remember whether Mach-O has an ifunc-like thing that takes an argument.

I don't know that we support ifuncs of any kind, but I'll ask.

llvm/lib/Transforms/Utils/ModuleUtils.cpp
388–397

If possible, I agree that we should treat this as invalid and diagnose in the frontend. However, it's possible that the current leniency puts us in a compatibility bind. What would we do if the frontend is forced to accept resolver functions with parameters? Can we substitute a thunk in the frontend that passes undefined arguments, or would that break semantics somehow?

Also, please fix this code to say "ResolverFunction" instead of "ResolvedFunction". That is an extremely important difference.

To answer your questions in the comments about what to do about resolvers with arguments: At least glibc always calls ifunc resolvers without any arguments. It just reads the address of the resolver function from the ELF file, casts it to a pointer to a void-returning function with no arguments and calls it.

So, I agree: Resolvers with arguments should not be allowed.

@rjmccall, I'm trying to remember whether Mach-O has an ifunc-like thing that takes an argument.

I don't know that we support ifuncs of any kind, but I'll ask.

Okay, the official word is that Mach-O has a .symbol_resolver feature which we regret adding because it adds per-call overheads that make it slower than just switching yourself in typical code patterns. As far as I know, that feature has never been hooked up to __attribute__((ifunc)), which is marked in Clang as a target-specific attribute for ELF targets, and so there is no need to consider Mach-O when changing the LLVM ifunc support.

Neither arc patch D139163 nor curl -L 'https://reviews.llvm.org/D139163?download=1' | patch -p1 applies the patch cleanly. Please rebase.

If you upload a patch with arc diff 'HEAD^', arc patch D139163 will very reliably apply it at the specified base commit.

MaskRay added a comment.EditedDec 21 2022, 10:21 PM

What's the motivation for piggybacking the functionality on top of ifunc? Does AMDGPU want to use ifunc to do something special?

A dynamic loader knows some properties like HWCAP. Except legacy x86 ifunc implementation, almost all glibc ports pass dl_hwcap to the ifunc resolver and some newer ports (aarch64,loongarch, well riscv is unfortunate) even make the struct extensible.
Lowering ifunc in IR makes such information unavailable.

Note that an ifunc can be emulated by a global function pointer which causes no difference at a call site.
There is more flexibility: the program can initialize at any time which is appropriate and provide arbitrary arguments.
It seems that no project uses __attribute__((ifunc)). The reason is in some part related to the inflexibility.
Because of initialization order fiasco, it's infeasible to initialize an ifunc with information from another module linked into the same executable/DSO.

What's the motivation for piggybacking the functionality on top of ifunc? Does AMDGPU want to use ifunc to do something special?

A dynamic loader knows some properties like HWCAP. Except legacy x86 ifunc implementation, almost all glibc ports pass dl_hwcap to the ifunc resolver and some newer ports (aarch64,loongarch, well riscv is unfortunate) even make the struct extensible.
Lowering ifunc in IR makes such information unavailable.

Note that an ifunc can be emulated by a global function pointer which causes no difference at a call site.
There is more flexibility: the program can initialize at any time which is appropriate and provide arbitrary arguments.
It seems that no project uses __attribute__((ifunc)). The reason is in some part related to the inflexibility.
Because of initialization order fiasco, it's infeasible to initialize an ifunc with information from another module linked into the same executable/DSO.

(Edited the previous comment a bit.)

What's the motivation for piggybacking the functionality on top of ifunc? Does AMDGPU want to use ifunc to do something special?

I'm considering it as an option to deal with some of the builtin library complexity, so it would be used with a restricted set of known functions. I'm not planning on exposing this as a user facing feature. The resolver would be known trivial after link and optimize out most of the time, but needs a fallback where a global constructor will do just fine.

What's the motivation for piggybacking the functionality on top of ifunc? Does AMDGPU want to use ifunc to do something special?

I'm considering it as an option to deal with some of the builtin library complexity, so it would be used with a restricted set of known functions. I'm not planning on exposing this as a user facing feature. The resolver would be known trivial after link and optimize out most of the time, but needs a fallback where a global constructor will do just fine.

Is there a reason that reusing ifunc is better than a global function pointer? Here is an example of a global function pointer.
I just added it to https://maskray.me/blog/2021-01-18-gnu-indirect-function#alternative

void impl();
void (*resolver())() { return impl; }

extern void (*const ifunc)() = resolver();

void use() {
  ifunc();
  ifunc();
  // ifunc = nullptr; // do not compile
}

What's the motivation for piggybacking the functionality on top of ifunc? Does AMDGPU want to use ifunc to do something special?

I'm considering it as an option to deal with some of the builtin library complexity, so it would be used with a restricted set of known functions. I'm not planning on exposing this as a user facing feature. The resolver would be known trivial after link and optimize out most of the time, but needs a fallback where a global constructor will do just fine.

Is there a reason that reusing ifunc is better than a global function pointer? Here is an example of a global function pointer.
I just added it to https://maskray.me/blog/2021-01-18-gnu-indirect-function#alternative

First of all, the language doesn't allow function pointers. Second, the idea would be to let clang emit the resolver with target_clones.

What's the motivation for piggybacking the functionality on top of ifunc? Does AMDGPU want to use ifunc to do something special?

I'm considering it as an option to deal with some of the builtin library complexity, so it would be used with a restricted set of known functions. I'm not planning on exposing this as a user facing feature. The resolver would be known trivial after link and optimize out most of the time, but needs a fallback where a global constructor will do just fine.

Is there a reason that reusing ifunc is better than a global function pointer? Here is an example of a global function pointer.
I just added it to https://maskray.me/blog/2021-01-18-gnu-indirect-function#alternative

First of all, the language doesn't allow function pointers. Second, the idea would be to let clang emit the resolver with target_clones.

For the moment this is just used as a strategy for llvm-reduce to delete ifuncs, I haven't started work on really using it.

What's the motivation for piggybacking the functionality on top of ifunc? Does AMDGPU want to use ifunc to do something special?

I'm considering it as an option to deal with some of the builtin library complexity, so it would be used with a restricted set of known functions. I'm not planning on exposing this as a user facing feature. The resolver would be known trivial after link and optimize out most of the time, but needs a fallback where a global constructor will do just fine.

Is there a reason that reusing ifunc is better than a global function pointer? Here is an example of a global function pointer.
I just added it to https://maskray.me/blog/2021-01-18-gnu-indirect-function#alternative

First of all, the language doesn't allow function pointers. Second, the idea would be to let clang emit the resolver with target_clones.

Which language doesn't allow function pointers but supports the function pointer array made possibly by this patch?
If such a language exists and the indirect function semantics is intended, I think the right call is to support function pointers, either a generic one which can be used as local variables, or a special one which can only be used in global variables.
Changing the language to allow __attribute__((ifunc)) and LLVM IR ifunc is an extension as well. For the reasons I mentioned, and Mach-O folks comments, I wish that there is some clue that ifunc is probably not a good user feature.

Which language doesn't allow function pointers but supports the function pointer array made possibly by this patch?

OpenCL forbids function pointers. If you use target_clones in clang, there's no function pointer in the source and the compiler takes care of everything. I want to patch up subtarget specific implementations of builtin functions. It's not for end users

Which language doesn't allow function pointers but supports the function pointer array made possibly by this patch?

OpenCL forbids function pointers. If you use target_clones in clang, there's no function pointer in the source and the compiler takes care of everything. I want to patch up subtarget specific implementations of builtin functions. It's not for end users

(Unfamiliar with OpenCL)

D51650+D122958 (target_clones) implement target_clones with ifunc and a fallback when supportsIFunc() == false.

There is some inefficiency that the resolver is repeatedly called.

int __attribute__((target_clones("sse4.2, default"))) foo(void) { return 0; }
// clang --target=x86_64-pc-fuchsia -O2 -S a.c   // newer than https://reviews.llvm.org/D127933

If we cache the resolver, do you still need ifunc?

There is some inefficiency that the resolver is repeatedly called.

int __attribute__((target_clones("sse4.2, default"))) foo(void) { return 0; }
// clang --target=x86_64-pc-fuchsia -O2 -S a.c   // newer than https://reviews.llvm.org/D127933

If we cache the resolver, do you still need ifunc?

The resolver will fold to a compile time return of a global address. I have another patch to optimize to a direct call. the ifunc / target_clones is for library source and distribution organization issues. I do not care about the performance of the resolver. We don't have compatible ISAs (other than for restricted subsets), so care is needed in the odd case where we would execute a resolver (it mostly works out within a generation. I also think a sufficiently magical piece of asm could work on all subtargets)

You previously said "OpenCL forbids function pointers. If you use target_clones in clang, there's no function pointer in the source and the compiler takes care of everything. I want to patch up subtarget specific implementations of builtin functions. It's not for end users"

I take it that if target_clones supports non-ifunc use, this patch is not needed. It seems that target_clones indeed works without ifunc (inefficient now, but can be improved), so may I assume that this is not needed?

You previously said "OpenCL forbids function pointers. If you use target_clones in clang, there's no function pointer in the source and the compiler takes care of everything. I want to patch up subtarget specific implementations of builtin functions. It's not for end users"

I take it that if target_clones supports non-ifunc use, this patch is not needed. It seems that target_clones indeed works without ifunc (inefficient now, but can be improved), so may I assume that this is not needed?

This is still a way for llvm-reduce to break this into something else it understands how to reduce. This patch does not try to run this anywhere, but adds the testing for the situations for the reduction

You previously said "OpenCL forbids function pointers. If you use target_clones in clang, there's no function pointer in the source and the compiler takes care of everything. I want to patch up subtarget specific implementations of builtin functions. It's not for end users"

I take it that if target_clones supports non-ifunc use, this patch is not needed. It seems that target_clones indeed works without ifunc (inefficient now, but can be improved), so may I assume that this is not needed?

This is still a way for llvm-reduce to break this into something else it understands how to reduce. This patch does not try to run this anywhere, but adds the testing for the situations for the reduction

Yes, fair. Then update the description?

Things like "For AMDGPU, ideally we would have a host-side resolver function, but failing that, we can have a device side" no longer apply.

arsenm updated this revision to Diff 485165.Dec 23 2022, 3:18 PM
arsenm edited the summary of this revision. (Show Details)

Change description

aeubanks accepted this revision.Jan 16 2023, 5:10 PM
This revision is now accepted and ready to land.Jan 16 2023, 5:10 PM
arsenm closed this revision.Jan 17 2023, 7:34 PM

a97ab6ebb43d76548b3340b96e2aba936e6ca415