Verify that the resolver exists, that it is a defined
Function, and that its return type matches the ifunc's
type. Add corresponding check to BitcodeReader, change
clang to emit the correct type, and fix tests to comply.
Details
Diff Detail
Event Timeline
Changed type verification to look at the resolver operand rather than the ultimate resolver function, added comments, fixed clang CodeGenModule to give the correct type to the resolver operand.
Might still need to fix some clang tests.
Because this commit changes an existing binary bitcode file, and because that file specifically tests backwards compatibility, does that mean I need to avoid changing it and instead add a backwards compatibility fix to the BitcodeReader? (Something like always bitcasting to the required type?)
Added check to BitcodeReader, fixed clang tests, hoisted
logic to shared static function on GlobalIFunc.
Change BitcodeReader to transparently fix up the type rather than returning an error because older formats use wrong type.
So I've noticed in my downstream that this fires in the cpu-dispatch.c codegen test, though it doesn't seem to catch it here? I'm not sure how this happens, but from your description, it SEEMS like this case https://godbolt.org/z/nejWhbsxa should cause this error.
That said, the cpuspecific/cpudispatch multiviersioning takes advantage of the resolver not being implemented (and being linked in later!) in order to implement some parts of it. So I'm not sure the 'resolver requires a definition' is a valid check?
I note now that asserts build fails for it: https://godbolt.org/z/r738hGoKf
Should this be reverted?
Hmm. When I try to compile an object file where the resolver is a declaration, both clang-13, clang-14, and gcc-9.3 complain that the ifunc must point to a defined function:
void *foo_resolver(); void foo(void) __attribute__((ifunc("foo_resolver")));
clang (13 and 14) complain:
obj.c:2:31: error: ifunc must point to a defined function void foo(void) __attribute__((ifunc("foo_resolver"))); ^ 1 error generated.
gcc 9.3.0 complains:
obj.c:3:6: error: ‘foo’ aliased to undefined symbol ‘foo_resolver’ 3 | void foo(void) __attribute__((ifunc("foo_resolver"))); | ^~~
I realize that the fact that frontends reject this doesn't necessarily mean that the IR they would have hypothetically produced is invalid, I'm just wondering about the semantics.
Drawing some parallels to GlobalAliases, you'll see that they also check that the aliasee is a definition rather than a declaration (Verifier::visitAliaseeSubExpr), which was the reason I added the same check to Verifier::visitGlobalIFunc.
Should an object file be produced with an UND ifunc symbol in that case? Wouldn't it be more correct to emit a plain old function declaration? (see llvm/test/Linker/ifunc.ll for behavior of linking an ifunc on top of a function declaration, should work correctly).
At the very least to alleviate the breakage I think we can just rip out the Assert(!Resolver->isDeclarationForLinker(), "...") from the Verifier, but I have a feeling that this is not the right long-term solution.
The cpu_specific/cpu_dispatch attributes are completely new to me, so bear with me if I'm misunderstanding; wouldn't the following implementation both provide the correct semantics and avoid an ifunc-with-an-undefined-resolver situation?
- The cpu_specific attribute emits
- A Function Declaration with a computed name "x.ifunc"
- A single Function with the cpu-specific body
- Multiple GlobalAliases with computed names whose aliasee is the function from 2)
- The cpu_dispatch attribute emits a strongly-defined non-interposable ifunc with the same computed name "x.ifunc", and a hidden defined resolver. Both IR linking and regular linking should resolve the plain function-delcaration-references to the ifunc properly.
Note that (as the examples demonstrate) clang self-verifies and checks among other things that ifuncs that it emits point to definitions; this happens in CodeGenModule::checkAliases().
I haven't read the cpu_specific/cpu_dispatch-related code in CodeGenModule yet, but I'm guessing that it doesn't register the generated aliases/ifuncs into the CodeGenModule::Aliases vector for deferred verification, which is why this didn't trigger the same error that my ifunc example did so far.
My understanding is the frontend's semantic rules are/were different from the IR rules, which is why we implemented it that way.
Should an object file be produced with an UND ifunc symbol in that case? Wouldn't it be more correct to emit a plain old function declaration? (see llvm/test/Linker/ifunc.ll for behavior of linking an ifunc on top of a function declaration, should work correctly).
I'm not sure what you mean here? Are you suggesting that an undefined resolver should instead just implement an undefined 'function' for the .ifunc? This doesn't seem right? Wouldn't that cause ODR issues?
At the very least to alleviate the breakage I think we can just rip out the Assert(!Resolver->isDeclarationForLinker(), "...") from the Verifier, but I have a feeling that this is not the right long-term solution.
I guess I still don't understand what the practical limitation that requires ifuncs to have a defined resolver? The resolver is just a normal function, so it seems to me that allowing them to have normal linking rules makes sense? I personally think this is the least obtrusive change; this patch is adding a limitation that didn't exist previously unnecessarily.
The cpu_specific/cpu_dispatch attributes are completely new to me, so bear with me if I'm misunderstanding; wouldn't the following implementation both provide the correct semantics and avoid an ifunc-with-an-undefined-resolver situation?
- The cpu_specific attribute emits
- A Function Declaration with a computed name "x.ifunc"
It just seems odd I guess to name a function .ifunc, and not have it be an ifunc? What does our linker think about that?
- A single Function with the cpu-specific body
- Multiple GlobalAliases with computed names whose aliasee is the function from 2)
- The cpu_dispatch attribute emits a strongly-defined non-interposable ifunc with the same computed name "x.ifunc", and a hidden defined resolver. Both IR linking and regular linking should resolve the plain function-delcaration-references to the ifunc properly.
I'm not sure what you mean by 'non-interposable'? We are intentionally forming the .ifunc to have the same linkage as the original function, so anything we do that would break that is not acceptable. Additionally, I'd hope that this wouldn't be an ABI break? Additionally, the way the CFE generates these calls would require a pretty massive overhaul, since we'd have to "know" when to replace those in cases where the cpu-specific and cpu-dispatch are in the same TU. Previously we did something similar for attribute-target multiversioning, but the idea of doing a replace-all-uses was considered unacceptable by the CFE code owners.
Thats correct, these aren't 'aliases' or 'ifuncs' as far as the CFE is concerned; they are multiversioned functions. That 'Aliases' and 'ifunc' list in the CFE are the AST-constructs of those, not the IR constructs, so there is no reason to put the multiversioned thinks in that list, since they are implementation details. Emitting an error "invalid alias!"/etc for
To 'unbreak' us for now, I've committed the suggested change to remove the definition check here: 09233412edae388a7bfa349cf792dba5aced057f
I'll first explain my thought process about the representation of aliases and ifuncs in the IR, and why I think both aliasees and resolvers must always be defined; I hope I'm not completely off track and would love it if @MaskRay could weigh in as to whether I make sense.
Let's start at the level of the object file; My understanding is that aliases apply to ELF and MachO, and ifuncs apply only to ELF. I'm not at all acquainted with MachO, but on the ELF side, my understanding is that:
- Aliases are simply lowered to additional symbols with the same st_value as their aliasee. As long as the value of a symbol has to be concrete/numeric and cannot express a way to refer to another symbol, for aliases to make sense and have the correct semantics at this level, their aliasee must be defined at the IR level. Otherwise all you're left with at the object file level is an undefined symbol with no way to express that it 'wants to' alias an external symbol with some specified name. In other words, symbols are either undefined (st_shndx == 0, st_value meaningless) or defined (st_shndx != 0, st_value meaningful and holds a section offset). If we were to allow aliases to have undefined aliasees, they would decay to simple undefined symbols and lose their aliasee information.
- IFuncs are lowered to specially typed symbols whose st_value is the resolver. In much the same way as aliases, for this to actually have any meaning, the resolver must be defined (because you have no way to specify "the value is in another castle named 'XYZ'", only "defined at offset X" or "undefined"). When we allow ifuncs to have undefined resolvers, they decay to simple undefined symbols with the additional wart of having a special symbol type, but the desired resolver name is lost. Concretely, as long as the linker doesn't throw a fuss at said wart, for the references against that symbol from within the object file this will behave like a simple undefined external function. Because in your implementation one TU will have a cpu_dispatch and therefore a defined resolver, it will 'win' and intra-EXE/intra-DSO references against the ifunc will indeed be bound against the return value of the resolver. If no translation unit in the EXE/DSO had an ifunc with the same name and a defined resolver, you'd end up with a peculiar undefined symbol of type ifunc in the EXE/DSO (same as the .o).
It is my conclusion therefore that ifuncs with undefined resolvers behave exactly like function declarations (and lose the name of the resolver), as long as the linker is willing to accept such weird symbols.
Therefore, at the IR level, they're representational slack at best, and don't do what you want (possibly binding against a differently-named resolver) at worst, so they should not be allowed.
My understanding is the frontend's semantic rules are/were different from the IR rules, which is why we implemented it that way.
As I understand it, features like aliases and ifuncs consist mostly of vertical plumbing to expose low-level object-file semantics, and their design must be informed by them.
I'm not sure what you mean here? Are you suggesting that an undefined resolver should instead just implement an undefined 'function' for the .ifunc? This doesn't seem right? Wouldn't that cause ODR issues?
As I understand it, making the symbol your current implementation calls "x.ifunc" a function declaration which gets upgraded to an ifunc with a defined resolver on encountering cpu_dispatch would yield the correct behavior.
I guess I still don't understand what the practical limitation that requires ifuncs to have a defined resolver? The resolver is just a normal function, so it seems to me that allowing them to have normal linking rules makes sense? I personally think this is the least obtrusive change; this patch is adding a limitation that didn't exist previously unnecessarily.
I think I've addressed this in the wall of text above
It just seems odd I guess to name a function .ifunc, and not have it be an ifunc? What does our linker think about that?
Ah, the name is just a name :)
As far as the linker is concerned, it encounters an object file with an undefined symbol (of type STT_NOTYPE) and an object file with a defined symbol with the same name, of type STT_GNU_IFUNC. It will bind references in the former against the definition in the latter.
Here's my trying it out:
itay@CTHULHU ~/tmp/ifuncdecl/tu> cat main.c int foo(void); int main() { return foo(); } itay@CTHULHU ~/tmp/ifuncdecl/tu> cat foo.c static int foo_impl(void) { return 42; } static void *foo_resolver(void) { return &foo_impl; } int foo(void) __attribute__((ifunc("foo_resolver"))); itay@CTHULHU ~/tmp/ifuncdecl/tu> clang-14 -c main.c -o main.c.o itay@CTHULHU ~/tmp/ifuncdecl/tu> clang-14 -c foo.c -o foo.c.o itay@CTHULHU ~/tmp/ifuncdecl/tu> clang-14 main.c.o foo.c.o -o main itay@CTHULHU ~/tmp/ifuncdecl/tu> ./main itay@CTHULHU ~/tmp/ifuncdecl/tu [42]> itay@CTHULHU ~/tmp/ifuncdecl/tu> llvm-readobj-14 --elf-output-style=GNU main.c.o --symbols | grep foo 4: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND foo itay@CTHULHU ~/tmp/ifuncdecl/tu> llvm-readobj-14 --elf-output-style=GNU foo.c.o --symbols | grep foo 1: 0000000000000000 0 FILE LOCAL DEFAULT ABS foo.c 3: 0000000000000000 16 FUNC LOCAL DEFAULT 2 foo_resolver 4: 0000000000000010 11 FUNC LOCAL DEFAULT 2 foo_impl 5: 0000000000000000 16 IFUNC GLOBAL DEFAULT 2 foo itay@CTHULHU ~/tmp/ifuncdecl/tu> llvm-readobj-14 --elf-output-style=GNU main --symbols | grep foo 35: 0000000000000000 0 FILE LOCAL DEFAULT ABS foo.c 36: 0000000000401150 16 FUNC LOCAL DEFAULT 13 foo_resolver 37: 0000000000401160 11 FUNC LOCAL DEFAULT 13 foo_impl 56: 0000000000401150 16 IFUNC GLOBAL DEFAULT 13 foo itay@CTHULHU ~/tmp/ifuncdecl/tu> llvm-readobj-14 --elf-output-style=GNU main --relocations Relocation section '.rela.dyn' at offset 0x3d0 contains 2 entries: Offset Info Type Symbol's Value Symbol's Name + Addend 0000000000403ff0 0000000100000006 R_X86_64_GLOB_DAT 0000000000000000 __libc_start_main@GLIBC_2.2.5 + 0 0000000000403ff8 0000000200000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0 Relocation section '.rela.plt' at offset 0x400 contains 1 entries: Offset Info Type Symbol's Value Symbol's Name + Addend 0000000000404018 0000000000000025 R_X86_64_IRELATIVE 401150
Thats correct, these aren't 'aliases' or 'ifuncs' as far as the CFE is concerned; they are multiversioned functions. That 'Aliases' and 'ifunc' list in the CFE are the AST-constructs of those, not the IR constructs, so there is no reason to put the multiversioned thinks in that list, since they are implementation details. Emitting an error "invalid alias!"/etc for
I see, makes sense, thanks for the explanation.
From my perspective, an ifunc is just a linkable entity, as is a resolver. If the linker can merge symbols for the resolver (and an ifunc points to one), it seems to me to make a ton of sense to allow the resolver to be defined in another TU?
I guess I feel the same way with an alias, why can't I just alias to a function in a different TU, so long as this is a linkable entity?
To continue my point... an ifunc/resolver is just like a function, in that a non-defined declaration is completely valid, since it refers to a definition in a separate TU. It makes sense to me that a resolver could do the same.
Actually... I question the diagnostic changes in this patch now. Why SHOULDN'T this work:
TU A:
void *resolver(void);
void *ifunc(void) attribute((ifunc("resolver")));
TU B:
int impl(void) { return 42; }
void *resolver(void) { return impl; }
I'm referring you again to the start of my explanation (the first three paragraphs); the object file format (ELF) literally cannot express the semantics you're asking for. You're asking for it to support a symbol in a special kind of undefined state:
ABC = "undefined, but becomes defined to the value of a different symbol XYZ if it ever becomes defined".
In other words, the logical type of the value of a symbol is either empty/undefined or offset-into-some-section. It is never name-of-another-symbol.
That is, unless I'm completely missing some special-sauce in the Elf32_Sym/Elf64_Sym documentation which allows for st_value to somehow point at the name of a different undefined symbol (am I missing something? can you point me at documentation that allows for this use-case?).
I don't know much about the ELF format... but this works today? We can define a resolver in a different TU and it WORKS thanks to the linker? So there is perhaps something?
It sort-of-works only because you define the ifunc in both translation units (with the same name). But looks like it behaves incorrectly for references to the ifunc in the translation unit where the resolver is only declared, not defined:
> cat example1.ll @single_version_ifunc = weak_odr dso_local ifunc void (), void ()* ()* @single_version_resolver define void ()* @single_version_resolver() { ret void ()* null } define dso_local void @useage() local_unnamed_addr { entry: tail call void @single_version_ifunc() ret void } > clang -c example1.ll -o example1.ll.o > llvm-readobj --relocations example1.ll.o File: example1.ll.o Format: elf64-x86-64 Arch: x86_64 AddressSize: 64bit LoadName: <Not found> Relocations [ Section (3) .rela.text { 0x11 R_X86_64_PLT32 single_version_ifunc 0xFFFFFFFFFFFFFFFC <-- FINE } Section (6) .rela.eh_frame { 0x20 R_X86_64_PC32 .text 0x0 0x34 R_X86_64_PC32 .text 0x10 } ] > cat example2.ll @single_version_ifunc = weak_odr dso_local ifunc void (), void ()* ()* @single_version_resolver declare void ()* @single_version_resolver() define dso_local void @useage() local_unnamed_addr { entry: tail call void @single_version_ifunc() ret void } > clang -c example2.ll -o example2.ll.o > llvm-readobj --relocations example2.ll.o File: example2.ll.o Format: elf64-x86-64 Arch: x86_64 AddressSize: 64bit LoadName: <Not found> Relocations [ Section (3) .rela.text { 0x1 R_X86_64_PLT32 single_version_resolver 0xFFFFFFFFFFFFFFFC <-- WHOOPS } Section (6) .rela.eh_frame { 0x20 R_X86_64_PC32 .text 0x0 } ]
Hmm... we've never had any problems with cpu-dispatch/specific on this before? An example just like that works in all my internal tests.
I don't know much about the ELF format... but this works today? We can define a resolver in a different TU and it WORKS thanks to the linker? So there is perhaps something?
The ifunc symbol that is emitted in the TU with the undefined resolver loses its connection to the resolver and the calls to the ifunc are instead bound against the resolver itself (which is absolutely not what you want).
itay> cat specific.c #include <stdio.h> __attribute__((cpu_specific(generic))) void single_version(void){ puts("In single_version generic"); } void useage() { single_version(); } itay> cat dispatch_main.c void useage(void); __attribute__((cpu_dispatch(generic))) void single_version(void); int main() { useage(); single_version(); return 0; } itay> clang -c dispatch_main.c -o dispatch_main.c.o itay> clang -c specific.c -o specific.c.o itay> clang specific.c.o dispatch_main.c.o -o main itay> ./main In single_version generic
This line should have been printed twice, not once.
I see... thank you for your patience, my knowledge of ELF (or, that is, basically everything after Clang LLVM-IR generation) is pretty slim.
I think what I would need to be able to correctly implement this is some level of 'forward declarable' ifunc. I have to be able to process declarations as we reach them, so just doing a function declaration then rewriting it wouldn't be allowed.
Is it possible to get some IR-level 'ifunc' declaration implemented? If so, I think I have a good idea on how to implement that in the CFE.
That feature already exists - use a plain old function declaration :)
My mental model for this is like this:
memcpy one of the is the most widely popular APIs commonly implemented as an ifunc. In clients of this API, it's just a plain old function declaration. In the implementer of this API, it's an ifunc with a defined resolver. Nothing new here.
It's true that this usage usually crosses a dynamic-linking boundary (rather than static linking), but a lot of the times dynamic linking and static linking are set up to mirror each other in behavior.
What I'm proposing is as follows. I really haven't read the existing implementation yet, so I'm not sure if it makes 100% sense in terms of it, but bear with me:
- When processing cpu_specific, emit a plain old function declaration "x.ifunc";
- When processing cpu_dispatch:
- Create an unnamed ifunc (call it GI)
- Call CodeGenModule::GetGlobalValue
- If the result was null, set the name of the ifunc and continue
- If the result wasn't null (call it F), use GI->takeName(F); F->replaceAllUsesWith(GI);
- Throughout, references against the multiversioned symbol are bound against the correctly named global. It just so happens that it could either begin its life as an ifunc and remain that way, begin its life as function declaration and remain that way, or begin its life as a function declaration and get upgraded (by RAUW) to an ifunc.
You can think of this as mirroring the behavior of the IR linker - linking a bitcode module containing an ifunc definition into an existing module where there's a function declaration with the same name as the ifunc simply tramples over the function declaration (RAUWs it with the linked-in ifunc). This is exactly what happens in llvm/test/Linker/ifunc.ll for bar.
If the result wasn't null (call it F), use GI->takeName(F); F->replaceAllUsesWith(GI);
When I wrote the multiversioning support in the first place, I was told that the above was unacceptable and the CFE doesn't wish to do that anymore. There ARE some examples, but I was instructed to not make it worse.
I see. What is the guiding principle there, though? Generating correct IR "up front" / "the first time" rather than "fixing it up as you go via manipulations"? (could you give a link?)
I can see the engineering consideration in not letting IR manipulations creep into the CFE, I just want to make sure that's the principle that we're asked to follow.
In the end this isn't the first instance where the "streaming" design of GlobalDecl emission forces a fixup/rewrite of a previous decision, as can be evidenced by a close sibling of this feature, CodeGenModule::EmitAliasDefinition (which uses exactly that idiom, for a very similar use-case)...
It will always be either a fixup or an accumulate/commit idiom, since there will always be a GlobalDecl ordering where the information to make 'the perfect module-global decision' isn't available upon having to act on partial information.
In the end, I would like to have the verification of IFuncs having a defined resolver restored (and avoid more dependencies being taken on this being 'allowed' at the IR level), since having LLVM emit an object file with an undefined STT_GNU_IFUNC is probably just trouble and confusion waiting to happen.
Sorry for being late to the party. Itay's explanation is correct to me.
I don't know much about multiversioned functions. My reply below if for ifunc.
- On ELF, an alias is just a symbol sharing st_shndx/st_value with another symbol. It is not by name. The target symbol can be overridden (say STB_WEAK overridden by STB_GLOBAL) while the alias itself remains unchanged.
- GNU indirect function is an ELF specific feature. Mach-O has a similar feature N_SYMBOL_RESOLVER but it is not modeled by an LLVM IR construct.
- While an assembler can create an "undefined ifunc" ({st_shndx=0, st_info=STB_GLOBAL<<4 | STT_GNU_IFUNC}), it is no different from a regular undefined symbol ({st_shndx=0, st_info=STB_GLOBAL<<4 | STT_NOTYPE}). You can check that the linker just ignore the STT_GNU_IFUNC type when replacing it with a definition. [1]
- For the GNU function attribute __attribute__((ifunc(...))), GCC requires that the target is defined. The LLVM IR construct models the C/C++ syntax, so it makes sense to enforce the same requirement.
[1]:
echo '.type foo, @gnu_indirect_function' > a.s echo '.globl foo; .type foo, @function; foo: ret' > b.s gcc -c a.s b.s ld a.o b.o readelf -Ws a.out
... If no translation unit in the EXE/DSO had an ifunc with the same name and a defined resolver, you'd end up with a peculiar undefined symbol of type ifunc in the EXE/DSO (same as the .o).
This is the ld.lld behavior.
GNU ld appears to leave a STT_FUNC undefined symbol. The idea may be that when cross-DSO, STT_GNU_IFUNC loses meaning.
In this case using "undefined ifunc" in the first place is a user error.
GNU indirect functions are largely under-specified. glibc implemented it and FreeBSD adopted it.
Few years ago, glibc folks wrote https://sourceware.org/glibc/wiki/GNU_IFUNC to specify some behaviors they expected to work.
You can see
Requirement (a): Resolver must be defined in the same translation unit as the implementations.
An undefined STT_GNU_IFUNC violates this requirement.
The second requirement says
Requirement (b): Cannot be weakly defined functions.
This mostly wants to warn you that STB_GLOBAL overriding STB_WEAK can lead to weird behaviors, and can be used as an excuse if the linker output has weird behaviors.
In practice STB_GLOBAL STT_GNU_IFUNC overriding STB_WEAK STT_GNU_IFUNC works fine in at least ld.lld, so it not probably unnecessary to strengthen the Clang and LLVM IR verifier diagnostics.
The justification is that we're supposed to be able to (as best as we can) work in a REPL environment or similar, , like Cling(not a misspell) does. The idea is that we should be able to always be emitting 'valid' and 'final' IR so that we can generate source 1 declaration at a time and be valid.
We DO have things that break this (as you've mentioned), but the CFE's policy is to not break any additional cases.
And how is Cling expecting CFE to deal with partial knowledge situations at the implementation level? To deal with exactly the non-local cases that the current violations address?
If there's no prescriptive way forward to dealing with these cases (so they're tech debt without a remediation plan), then as far as I'm concerned this case sits exactly under the same tech-debt umbrella of the existing violations and the way forward is using the same violating solution for this case too.
We definitely shouldn't block the IR verification indefinitely on this impasse. Once an acceptable solution is found, this will also be part of that refactor.
My understanding is that the REPL setup is that the 'IR' just needs to be in a state where it doesn't require reverts/rewrites later, so that we can do partial-back-end-code-generation as we go along. That said, I'm not positive as to the implications. I'm just repeating the discussion the CFE code-owner made at the time.
IMO, the 'acceptable' solution is to have a way to forward-declare an ifunc in IR so that we can fill in the resolver later on. From your description earlier, it seems that this would work as we could emit it as an 'unknown symbol' (as if it was an undefined function declaration), and would be completely implementable in the CFE.
So it would change your plan from earlier to:
When processing cpu_specific, emit the ifunc "x.ifunc", with no resolver;
When processing cpu_dispatch:
Get/Create the ifunc, then pull up the resolver. If the resolver is null (as it should be), create one and update the 'ifunc'. Generate said resolver.
Speaking of the incremental compilation case, we can experiment with the clang-repl binary. I am not sure about the details of this discussion but here is an example that works today:
llvm/build/bin/clang-repl clang-repl> __attribute__((cpu_specific(ivybridge))) void single_version(void){} clang-repl> void useage() {single_version();} clang-repl> quit
What would it be a good example to check if the incremental compilation case is covered?
I feel like we're getting lost in the weeds here.
At the time a bitcode module is finalized, it is supposed to be in a valid state.
The LLVM bitcode verifier does not consider GlobalAliases which have either a null or an undefined aliasee to be valid. The same should have held for GlobalIFuncs and their resolvers since their inception, and the fact that it were not so is an oversight.
There are two separate issues here which we need to decouple:
- IFuncs with undefined resolvers were never valid, the verification just happened to be missing. They also misbehave, as demonstrated by my example above where a TU contains both cpu_specific and a usage, but no cpu_dispatch. Therefore, we'd not be making anything worse by plugging the current hole in the verification of GlobalIFunc and keeping cpu_specific/cpu_dispatch working, using the same method we use to handle aliases or ifuncs that come after declarations, i.e. cpu_specific emits plain function declarations, cpu_dispatch upgrades them via takeName+RAUW+eraseFromParent if they already exist. We'll have made the verification more robust, fixed a bug when there's a TU where there's cpu_specific + usage but no cpu_dispatch, and not have incurred more tech debt by doing so (since that tech debt already exists, and a solution would need to address all these cases in exactly the same way).
- Clang-repl/CGM needs infrastructure for dealing with this sort of 'backtracking' in incremental compilation use-cases (declaration gets upgraded to alias, declaration gets upgraded to ifunc). If it needs IR changes for that, then they should be designed, agreed upon, and integrated. We're not going to have a decision made in this closed PR discussion that either GlobalAliases or GlobalIFuncs support a declaration state all of a sudden. Such decisions could have cross-cutting ramifications in that there would all of a sudden be 3 ways to represent things that are equivalent to/get lowered to function declarations, rather than one. Limiting ourselves to not using the existing solution for these use-cases/bugs in normal compilation with the hope that it will ease the creation of a solution for incremental compilation of the same use-cases is self-defeating.
As for clang-repl, I've so far tried the following; I get the sense that I'm poking around in less well-specified places (asserts and null-deref crashes):
itay ~/llvm-project/build (main)> bin/clang-repl clang-repl> #include <stdio.h> clang-repl> __attribute__((cpu_dispatch(generic))) void a(void); clang-repl> __attribute__((cpu_specific(generic))) void a(void) { puts("hi"); } In file included from <<< inputs >>>:1: input_line_2:1:55: warning: body of cpu_dispatch function will be ignored [-Wfunction-multiversion] __attribute__((cpu_specific(generic))) void a(void) { puts("hi"); } ^ clang-repl> auto b = (a(), 5); clang-repl: /home/itay/llvm-project/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp:1683: void llvm::AsmPrinter::emitGlobalIFunc(llvm::Module &, const llvm::GlobalIFunc &): Assertion `GI.hasLocalLinkage() && "Invalid ifunc linkage"' failed. fish: 'bin/clang-repl' terminated by signal SIGABRT (Abort) itay ~/llvm-project/build (main) [SIGABRT]> bin/clang-repl clang-repl> extern int g_a __attribute__((alias("g_b"))); In file included from <<< inputs >>>:1: input_line_0:1:31: error: alias must point to a defined variable or function extern int g_a __attribute__((alias("g_b"))); ^ fish: 'bin/clang-repl' terminated by signal SIGSEGV (Address boundary error) itay ~/llvm-project/build (main) [SIGSEGV]> bin/clang-repl clang-repl> void foo(void) __attribute__((ifunc("foo_resolver"))); In file included from <<< inputs >>>:1: input_line_1:1:31: error: ifunc must point to a defined function void foo(void) __attribute__((ifunc("foo_resolver"))); ^ fish: 'bin/clang-repl' terminated by signal SIGSEGV (Address boundary error) itay ~/llvm-project/build (main) [SIGSEGV]> bin/clang-repl clang-repl> void foo(void); clang-repl> void bar(void) __attribute__((alias("foo"))); In file included from <<< inputs >>>:1: input_line_1:1:31: error: alias must point to a defined variable or function void bar(void) __attribute__((alias("foo"))); ^ fish: 'bin/clang-repl' terminated by signal SIGSEGV (Address boundary error) itay ~/llvm-project/build (main) [SIGSEGV]> bin/clang-repl clang-repl> void foo(void); void bar(void) __attribute__((alias("foo"))); void foo(void) {} In file included from <<< inputs >>>:1: input_line_0:1:47: error: alias must point to a defined variable or function void foo(void); void bar(void) __attribute__((alias("foo"))); void foo(void) {} ^ fish: 'bin/clang-repl' terminated by signal SIGSEGV (Address boundary error) itay ~/llvm-project/build (main) [SIGSEGV]> bin/clang-repl clang-repl> void foo(void) {} clang-repl> void bar(void) __attribute__((alias("foo"))); In file included from <<< inputs >>>:1: input_line_1:1:31: error: alias must point to a defined variable or function void bar(void) __attribute__((alias("foo"))); ^ fish: 'bin/clang-repl' terminated by signal SIGSEGV (Address boundary error)
FWIW, the comments above are about not making it 'worse' at least/adding more wo
To me the 'not in the weeds' part is, "How do I get CPU-dispatch/CPU-specific to work without RAUW, since that is offensive to the CFE code owner? Additionally, I found that some of the examples without the defined resolver actually DO work in my downstream, though I don't know what changes we make to make that happen. So adding this limitation in actually breaks my downstream.
The LLVM bitcode verifier does not consider GlobalAliases which have either a null or an undefined aliasee to be valid. The same should have held for GlobalIFuncs and their resolvers since their inception, and the fact that it were not so is an oversight.
Citation Needed here. Is there a design document that specifies this, or is this your opinion? If its the latter, the implementation was the documentation, so this is still a breaking change.
To me the 'not in the weeds' part is, "How do I get CPU-dispatch/CPU-specific to work without RAUW, since that is offensive to the CFE code owner? Additionally, I found that some of the examples without the defined resolver actually DO work in my downstream, though I don't know what changes we make to make that happen. So adding this limitation in actually breaks my downstream.
Regarding the examples that do work, I've provided explanations as to how they partially sort-of-work, but that the general semantics are broken (the 'connection' to the resolver is 'severed' in each translation unit that has it as a declaration + all references from within that TU are borked).
The problems that the CFE currently uses RAUW to solve are fundamental to the specified behavior of the language features that use it, so a solution for these problems needs to arise from the CFE. Imagine the following incremental compilation use-case:
> extern int x; // x is a GlobalVariable declaration (initializer == null). > int y; // y is a GlobalVariable definition (initializer = constant 0). > extern int x __attribute__((alias("y"))); // x needs to transform into a GlobalAlias now. > > extern void *my_memcpy(void *dest, void *src, size_t n); // my_memcpy is a Function declaration > void *my_memcpy(void *dest, void *src, unsigned long n) __attribute__((ifunc("my_memcpy_resolver"))); // my_memcpy_resolver is a Function declaration, my_memcpy is a GlobalIFunc with declaration for a resolver - whoops, incremental module is invalid at this point > static void *my_memcpy_impl(void *dest, void *src, unsigned long n) { return 0; } > static void *my_memcpy_resolver(void) { return &my_memcpy_impl; } // Now my_memcpy ifunc has defined resolver > > void CpuSpecific(void); // CpuSpecific is a Function declaration > void Foo(void) { CpuSpecific(); } // Foo calls a function declaration > __attribute__((cpu_specific(generic))) void CpuSpecific(void) { puts("generic"); } // We still don't know whether cpu_dispatch will be in this TU or not. Do we upgrade CpuSpecific from a function declaration to a definition? to an ifunc? Bind directly to this body? > __attribute__((cpu_dispatch(generic))) void CpuSpecific(void); // Now we know that we have to upgrade it to an ifunc, and how the resolver should look. But it's already a Function (declaration).
All the above need to work in non-incremental compilation, and the only existing way for them to work right now is RAUW or a module finalization step. If the CFE code owner(s) find that offensive but proposes no alternative, what is one to do? There are 3 possible states here:
- Remain broken
- Solve using RAUW/finalization
- Solve using yet-unproposed non-RAUW solution (which solves all of the above use-cases simultaneously because they're essentially the same)
It is better to treat (2) as a way out of (1) which doesn't increase the cost of (3), than to just stay at (1) until (3) happens. As it currently stands, the somewhat similar issue of calling a declared-but-undefined function in the REPL is currently embodied as a JIT session error, with failure to materialize symbols. Perhaps a valid solution for the work-in-progress aliases and ifuncs is to transform them into declarations in the JIT module until they have definitions for their aliasees/resolvers. But we won't know without more context. I think it might be more effective that I'll write a patch up which does use RAUW together with a test for the breakage we discussed and we'll continue the discussion with the CFE code owner(s) there.
The LLVM bitcode verifier does not consider GlobalAliases which have either a null or an undefined aliasee to be valid. The same should have held for GlobalIFuncs and their resolvers since their inception, and the fact that it were not so is an oversight.
Citation Needed here. Is there a design document that specifies this, or is this your opinion? If its the latter, the implementation was the documentation, so this is still a breaking change.
I'm assuming that you're talking about GlobalIFuncs, since for GlobalAliases you'll see both constraints encoded in the Verifier::visitGlobalAlias and Verifier::visitAliaseeSubExpr functions.
As for GlobalIFunc, I have no design document for the LLVM-IR-level representation of the feature, but:
- @MaskRay has provided reference for the object-format constraint itself (Requirement (a): Resolver must be defined in the same translation unit as the implementations.). Bitcode Modules correspond to TUs - so it stands to reason that the same restriction apply to them, unless behavior is implemented to bridge the gap.
- I've demonstrated that compiling a bitcode module containing an ifunc with an undefined resolver emits broken code for all usages of the ifunc-with-no-resolver, where I can probably massage that into a crash with a small bit of additional work (since it ends up calling the resolver rather than calling the return value of the resolver). This applies to current use-cases of cpu_specific in translation units that don't contain the corresponding cpu_dispatch, which I understand is part of the documented usage model of cpu_specific/cpu_dispatch. So it's already subtly broken, and in reference to (1), this means that behavior to bridge the gap between what the IR ostensibly allows and what the object format allows is not implemented.
- CFE already diagnoses this issue for plain ifuncs and aliases.
- I've drawn significant parallels between aliases and ifuncs at the object file level, and extrapolated the reasoning for why aliases have to point at defined objects to the reasoning why ifuncs have to point at defined objects.
So, when I say "should have held" I am indeed expressing my opinion. But I do believe that I've backed it up substantially, and that from a design perspective adding that restriction to the IR really is the correct decision.
Calling the implementation the documentation doesn't ring right with me - is the broken behavior I referred to in (2) a compatibility constraint now? Are the bizarre disallowed-by-the-specification undefined STT_GNU_IFUNC symbols emitted by cpu_specific-without-cpu_dispatch a compatibility constraint now?
As I mentioned, my downstream actually DOES work, I believe our code-generator has some changes that are making this work, but I haven't found the actual commit yet. I suspect it is some level of changes that emit ifuncs with undefined resolvers differently.
The problems that the CFE currently uses RAUW to solve are fundamental to the specified behavior of the language features that use it, so a solution for these problems needs to arise from the CFE. Imagine the following incremental compilation use-case:
> extern int x; // x is a GlobalVariable declaration (initializer == null). > int y; // y is a GlobalVariable definition (initializer = constant 0). > extern int x __attribute__((alias("y"))); // x needs to transform into a GlobalAlias now. > > extern void *my_memcpy(void *dest, void *src, size_t n); // my_memcpy is a Function declaration > void *my_memcpy(void *dest, void *src, unsigned long n) __attribute__((ifunc("my_memcpy_resolver"))); // my_memcpy_resolver is a Function declaration, my_memcpy is a GlobalIFunc with declaration for a resolver - whoops, incremental module is invalid at this point > static void *my_memcpy_impl(void *dest, void *src, unsigned long n) { return 0; } > static void *my_memcpy_resolver(void) { return &my_memcpy_impl; } // Now my_memcpy ifunc has defined resolver > > void CpuSpecific(void); // CpuSpecific is a Function declaration > void Foo(void) { CpuSpecific(); } // Foo calls a function declaration > __attribute__((cpu_specific(generic))) void CpuSpecific(void) { puts("generic"); } // We still don't know whether cpu_dispatch will be in this TU or not. Do we upgrade CpuSpecific from a function declaration to a definition? to an ifunc? Bind directly to this body? > __attribute__((cpu_dispatch(generic))) void CpuSpecific(void); // Now we know that we have to upgrade it to an ifunc, and how the resolver should look. But it's already a Function (declaration).All the above need to work in non-incremental compilation, and the only existing way for them to work right now is RAUW or a module finalization step. If the CFE code owner(s) find that offensive but proposes no alternative, what is one to do? There are 3 possible states here:
- Remain broken
As mentioned above, not necessarily always broken, my downstream works with your examples of how its broken. Breaking this is IMO, not acceptable.
- Solve using RAUW/finalization
This is unacceptable to the Clang CFE maintainer, so I believe this is a non-starter.
- Solve using yet-unproposed non-RAUW solution (which solves all of the above use-cases simultaneously because they're essentially the same)
My preference is clearly this one. If we had a way to create the ifunc in the CFE with NO resolver (that is, a forward-declaration of an ifunc!) that could be emitted as-if it were a normal function call (which is what you suggested to happen in the CFE, so this is just suggesting it later), it solves my problem. As far as I can tell, this is what my downstream is doing (except the key-off here is whether the resolver is not defined, not null). Can we discuss an actual solution as a part of this discussion?
It is better to treat (2) as a way out of (1) which doesn't increase the cost of (3), than to just stay at (1) until (3) happens. As it currently stands, the somewhat similar issue of calling a declared-but-undefined function in the REPL is currently embodied as a JIT session error, with failure to materialize symbols. Perhaps a valid solution for the work-in-progress aliases and ifuncs is to transform them into declarations in the JIT module until they have definitions for their aliasees/resolvers. But we won't know without more context. I think it might be more effective that I'll write a patch up which does use RAUW together with a test for the breakage we discussed and we'll continue the discussion with the CFE code owner(s) there.
The LLVM bitcode verifier does not consider GlobalAliases which have either a null or an undefined aliasee to be valid. The same should have held for GlobalIFuncs and their resolvers since their inception, and the fact that it were not so is an oversight.
Citation Needed here. Is there a design document that specifies this, or is this your opinion? If its the latter, the implementation was the documentation, so this is still a breaking change.
I'm assuming that you're talking about GlobalIFuncs, since for GlobalAliases you'll see both constraints encoded in the Verifier::visitGlobalAlias and Verifier::visitAliaseeSubExpr functions.
As for GlobalIFunc, I have no design document for the LLVM-IR-level representation of the feature, but:
- @MaskRay has provided reference for the object-format constraint itself (Requirement (a): Resolver must be defined in the same translation unit as the implementations.). Bitcode Modules correspond to TUs - so it stands to reason that the same restriction apply to them, unless behavior is implemented to bridge the gap.
- I've demonstrated that compiling a bitcode module containing an ifunc with an undefined resolver emits broken code for all usages of the ifunc-with-no-resolver, where I can probably massage that into a crash with a small bit of additional work (since it ends up calling the resolver rather than calling the return value of the resolver). This applies to current use-cases of cpu_specific in translation units that don't contain the corresponding cpu_dispatch, which I understand is part of the documented usage model of cpu_specific/cpu_dispatch. So it's already subtly broken, and in reference to (1), this means that behavior to bridge the gap between what the IR ostensibly allows and what the object format allows is not implemented.
- CFE already diagnoses this issue for plain ifuncs and aliases.
So its important to note here that a 'language level' ifunc is not a perfect match to the IR-level ifunc, which is not a perfect match to an object-file ifunc. These are 3 different features that map closely, but not exactly.
- I've drawn significant parallels between aliases and ifuncs at the object file level, and extrapolated the reasoning for why aliases have to point at defined objects to the reasoning why ifuncs have to point at defined objects.
So, when I say "should have held" I am indeed expressing my opinion. But I do believe that I've backed it up substantially, and that from a design perspective adding that restriction to the IR really is the correct decision.
I appreciate the explanation.
Calling the implementation the documentation doesn't ring right with me - is the broken behavior I referred to in (2) a compatibility constraint now? Are the bizarre disallowed-by-the-specification undefined STT_GNU_IFUNC symbols emitted by cpu_specific-without-cpu_dispatch a compatibility constraint now?
In absence of a spec, the implement-is-the-spec just by nature. As I mentioned, my downstream DOES have this work, so this is a compatibility constraint (and we as a project tend to at least try to work with downstreams by not breaking them this aggressively). That said, I am, and have been willing to work with you to make the reasonable changes here. I'm trying to get us to a solution that doesn't break my users and allows you to recommit your patch.
Unrelated to missing resolver definition, this change doesn't accommodate resolvers that take parameters. (Curiously, this verification only fails with ThinLTO).
// with -flto=full or without -flto=thin, below command works $ clang -shared ifunc.cpp -fPIC -fuse-ld=lld -flto=thin IFunc resolver has incorrect type i32 ()* @_Z5ifuncv $ cat ifunc.cpp #include <stdint.h> typedef int (*fn_ptr_t)(); int ifunc() __attribute__((ifunc("resolver"))); int ret42() { return 42; } extern "C" fn_ptr_t resolver(uint64_t hwcap) { return ret42; }
I have a change that fixes the above use case but causes some opaque pointer tests to fail. I'll investigate and upload once they're fixed.
I also noticed that this patch didn't add any test for the "IFunc resolver has incorrect type" errors cases. It'd be good to add those in a follow-up.
I now realize that the type check isn't correct for the platforms which pass arguments to the resolver. Unfortunate that the glibc wiki doesn't mention this (as far as I can tell)...
I thought that the bitcast-to-"expected"-type should shield from that error, but maybe something drops the bitcast along the way. That reminded me of https://github.com/llvm/llvm-project/blob/f6ee45e94391ef8cee67e2a4ad6d61c614985de9/llvm/lib/Transforms/IPO/LowerTypeTests.cpp#L388-L391.
In addition, it sounds to me that the resolver type check will be made redundant by the opaque pointer work, so maybe it makes sense to remove it altogether now? I'm not in the details enough with respect to the migration plan to know.
Also, I recall there are some outstanding issues with respect to thinlto+ifunc: https://reviews.llvm.org/D82745 which may be of interest to your use-case as well.
Drive-by comment, not sure if it's relevant: clang doesn't run the verifier (by default) when assertions are turned off. This logic is in the Clang driver; if you add -### or -v you'll see the -cc1 has -disable-llvm-verifier listed whenever assertions are off. I believe if it reads a bitcode file, the "read bitcode" logic will run the verifier anyway; this just disables the "run verifier after IRGen" logic.