- User Since
- Feb 6 2020, 9:58 AM (200 w, 1 d)
Sep 29 2023
Sep 25 2023
The patch tested on NDK r26 bulding simple Function Multi Versioning project and running on Android API 25,29,30,33 - works fine.
Use weak "memfd_create" to check for API 30+, split and rename init cpu features functions ( __init_cpu_features, __init_cpu_features_resolver, __init_cpu_features_constructor )
Sep 8 2023
Sep 6 2023
Sep 4 2023
Offtopic: Outlining atomics seems to be very CPU specific thing. In my experience LSE were ~= old exclusive semantics. So adding extra call + extra bit check (too bad IFUNCs are not used :)) each time it would be executed seems to be quite an extra load (for CPU, TLB, dcache..), so I'm not sure that outline atomics is a win-win thing (at least on some of the CPUs). This is absolutely not a case for this patch anyway, just some of my thoughts, I would be glad to hear other opinions :)
Outline atomics overhead is mostly negligible. "Various members in the Arm ecosystem have measured the performance impact of this indirection on a diverse set of systems and we were happy to find out that it was minimal compared to the benefit of using the LSE instructions for better scalability at large core counts." https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/making-the-most-of-the-arm-architecture-in-gcc-10
For IFUNCs Function Multi Versioning https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning lse feature can be used.
I understand that you could disable it with extra option and that for now it would be different with gcc, but it looks debatable to me that such a behaviour in gcc is correct and expected, maybe someone need to change it there too. (Please keep in mind that I might be wrong with my position and during discussion please consider this patch as NFC so we could discuss here am I right or not :)). My point is that the gcc is not a golden standard in this questions too and in my mind such a behaviour (disabling extra dependencies in freestanding mode without passing extra flags) would be expected the very same way the -fno-builtint get's auto implied with -ffreestanding flag passed.
Outline atomics are dependent on runtime library availability ( libgcc or compler-rt ). If there are no proper library available they will be disabled. So if in freestanding mode compiler is not dependent on runtime library you can remove it and get rid of outline atomics calls automatically.
Having runtime library in freestanding mode you can disable outline atomics specifying -mno-outline-atomics option. But disabling them just by -ffreestanding option will create divergence with GCC behaviour, which has outline atomics not disabled in this case.
From what i understand in freestanding environment runtime library is not used and clang driver can detect this situation and set outline atomics off by default ( see clang/lib/Driver/ToolChains/Linux.cpp Linux::IsAArch64OutlineAtomicsDefault -> GetRuntimeLibType )
On AArch64 for FMV we are using target_version attribute and AppendTargetVersionMangling with llvm::stable_sort
Aug 24 2023
Aug 23 2023
Aug 21 2023
Friendly ping, are there any questions remained to proceed with target-independent __builtin_cpu_supports ?
Jul 24 2023
Jul 19 2023
I fully agree with you to make __builtin_cpu_supports() target-independent ( and I will update my patch on top of yours ). If plus-separated format is not supported by all targets then I would suggest to make SemaBuiltinCpuSupports target-dependent - it allows to implement plus-separated format on AArch64 keeping __builtin_cpu_supports() target-independent.
Jul 13 2023
Jul 11 2023
Jun 16 2023
Thank you for the patch, it comes in the right time - we are also working on AArch64 __builtin_cpu_supports, and I was thinking how to make it more general.
I uploaded our RFC version for review https://reviews.llvm.org/D153153
It would be great to have in __builtin_cpu_supports argument string of plus-separated features. Just SemaBuiltinCpuSupports need to handle this.
May 23 2023
May 18 2023
May 17 2023
Apr 20 2023
I think the order should instead be:
- include system headers that potentially defined HWCAP_xxx
- block with #ifndef HWCAP_xxx ... #define HWCAP_xxx ...
- block with inline functions actually using HWCAP_xxx
Absolutely correct. I think system headers includes
#if defined(__has_include) #if __has_include(<sys/auxv.h>) #include <sys/auxv.h> #if __has_include(<asm/hwcap.h>) #include <asm/hwcap.h>
Mar 8 2023
Rebasing and addressing comments.
Mar 7 2023
Jan 30 2023
I fully agree, thank you for valuable comments! Let me address them in separate patch.
Jan 23 2023
Sorry, commit rG5474d7d93271 is not related to this, I put wrong differential revision link
Jan 20 2023
Jan 9 2023
LGTM, thanks for fixing this!
Jan 8 2023
Jan 7 2023
Sounds good to me, please look into the patch adding such cmake option https://reviews.llvm.org/D141199
Using cmake -DCOMPILER_RT_DISABLE_AARCH64_FMV=On will exclude Aarch64 multiversioning support from compiler-rt build.
Jan 6 2023
Right, you will need to explicitly provide '-mno-fmv` then. Currently __aarch64_cpu_features cpu_model.c stuff located in bultins (libclang_rt.builtins-aarch64.a). Did I understand you correctly that your apps linked agains libclang_rt.builtins-aarch64.a and if we move function multiversioning part to new library, lets say libclang_rt.cpu_features-aarch64.a, that will resolve your concern ? As a sidenote, builtins/cpu_model.c contains X86 CPU features used in function multiversioning on that target as well.
Function multiversioning expects compiler-rt has __aarch64_cpu_features, it will be broken if compiler-rt miss that ( clang/lib/Driver/ToolChains/Clang.cpp:7231 ). I believe function multiversioning will be used in Android as outline atomics already did.
I also had a couple of general questions, since I think I'm missing something obvious:
- How come we need both init_cpu_features and init_cpu_features_resolver? It seems like we're codegenning calls to the latter where needed, so I'm not sure what the former is adding on top.
- From what I can see, we're codegenning calls to init_cpu_features_resolver without any arguments, but the actual definition of that function has two arguments. How does that work?
hwcaps are ABI specified arguments of ifunc resolver. The constructor init_cpu_features calls getauxval to initialize hwcaps and then pass them explicitly to init_cpu_features_resolver. If resolver called before constructor we get init_cpu_features_resolver hwcap and hwcap2 as arguments from dynamic loader.
Jan 3 2023
FMV patch with target feautures dependencies relanded fe5cf480ee5a
Dec 28 2022
Thanks! Fix committed 2184fcf17ee00a939b3bde98a28ef586c67d6b1a
Dec 27 2022
Dec 22 2022
Many thanks @hctim for investigation! I refactored Strings and StringsBuffer not to capture dangling references - use-of-uninitialized-value errors were fixed. I would reland this patch if you don't mind.
I don't think std::equal is underlying cause here. We have featuresStrs_size() comparison before calling it:
Dec 21 2022
I've managed to reproduce "MemorySanitizer: use-of-uninitialized-value" error locally, thank you @hctim for help!
If I understand it right, it seems MSan didn't handle correctly SmallVector - a variable-sized array with some number of elements in-place and heap allocation for additional elements if needed:
Regular builds works fine for me, pthreads located here "/lib/x86_64-linux-gnu/libpthread.so" "/usr/lib/x86_64-linux-gnu/libpthread.so". Enabling "-DLLVM_USE_SANITIZER=Memory" resulted in many "WARNING: MemorySanitizer: use-of-uninitialized-value" on tblgen like:
Dec 20 2022
The test was fixed bf94eac6a3f7c5cd8941956d44c15524fa3751bd ( it missed the case that Fuchsia has ToolChain::RLT_CompilerRT as default GetDefaultRuntimeLibType ).
Please let me know if you have any issues remained with the patch.
Thanks for report, fixed in a43f36142c501e2d3f4797ef938db4e0c5e0eeec
Dec 19 2022
Does anyone have any more objections? I'm going to merge it.
Dec 11 2022
Comments have beed addressed.
Dec 5 2022
Dec 1 2022
Rebase on top of trunk. Addressed comments. Fixed initFeatureMap issue resulting in some fmv features not enabled on target, use it and AARCH64_ARCH_EXT_NAME for target attribute features dependencies as well.
Nov 23 2022
Addressing comments and ACLE specification updates.
sme-f64 -> sme-f64f64 and sme-i64 -> sme-i16i64 features are expected to be renamed in specification.
Sep 26 2022
@sebpop could you ellaborate on __sync_* operations usage, are you getting issues with current Clang implementation? Do Clang need to keep supporting them and fix introducing new memory model? It seems we need compelling reasons to do that.
Jul 14 2022
Jun 20 2022
Commited in af6d2a0b6825e71965f3e2701a63c239fa0ad70f
Jun 18 2022
Addressing comments, stylistic corrections.
Jun 15 2022
Thank you for fair concern, "target_clones" for AArch64 has different format, semantic, e.g. "default" is not required. Therefore it diverges with X86 in these parts. "target" attribute has been already used and supported on AArch64 in a different sense, like target("arm"), target("dotprod"), target("branch-protection=bti"). The intention of creating new "target_version" attribute is not to overlap with that. It also has different format, mangling and semantic, e.g. treating function without attribute as "default", and option to disable attribute droping function multi versions. Do these explanations dispel your concern?
Jun 14 2022
May 17 2022
I think it looks reasonable to define 5th memory model, add barriers __sync_* builtins and to outline-atomics calls as well.
Apr 19 2021
Apr 13 2021
Just a friendly ping. Jessica @paquette, do you have any further suggestions?