True, I was aware of the presence of some of these options, thank you for indicating others. I'm not yet entirely convinced, especially that clang-tidy behaviour would be possibly different.
I may be over-reacting to the way the patch seemed to be touching on the C++ ABI in multiple places. My understanding is that ms_abi is just a calling-convention attribute; it's basically "use the (default) calling convention that MSVC would use for this function". If that's all you want, then this is reasonable, although I am worried about creating a new attribute for every system that Wine chooses to target.
With the latest, do you see similar speed up for probe profile and dwarf profile?
This change speeds up this by grouping all the call frame within one LBR sample into a trie and aggregating the result(sample counter) on it.
5x speedup shows it is a really impressive improvement. I am wondering whether there is callstack overlap between different LBR samples so you can have further grouping of call frames -- by reusing unwindState. You may also save some cost by reusing the frame trie. IIUC although samples have been aggregated based on callstack, each LBR sample may have multiple callstacks inferred from unwindCall/unwindReturn. If there are callstack overlap between different LBR samples, you may be able to further group them.
Rebase after 560d7e04113bf.
Note that there's one additional issue right now. The TableGen cross-compilation sub-invokes CMake which fails with:
CMake Error at /src/clang-llvm/llvm-project/libc/CMakeLists.txt:49 (message):
Updated llvm/test/CodeGen/X86/critical-anti-dep-breaker.ll to show the whole test in https://reviews.llvm.org/D94215.
Changing help message for the switch.
- address @craig.topper's comment.
- rewrite script as python.
lgtm with minor comment.
Addressing Wenlei's feedback.
do not use 'else' after 'return'
In fact, if https://reviews.llvm.org/D95198 is acceptable it is a separate justification for this change. In order to place program headers somewhere other than the lowest VMA in the program image something like this needs to happen.
Addressing Wei's feedback.
If the direction of the patch is acceptable I can provide a detailed analysis of each required testcase adjustment.
Adopt rupprecht's suggestion
For out-of-order-sections.s, you could just swap foo and bar. The few updated tests seem to have undesired sh_offset changes.
(fix the description before pushing)
In ELF, we usually use yaml2obj to generate invalid object files, instead of checking in precanned binaries. You can find lots of grimar's changes migrating away from binaries.
I'll try to take a look by end of next week.
LGTM, Let's wait some time to see if anybody else has more comments. And make sure to update the commit message before push.
Fix this by moving lowering of llvm.amdgcn.init.exec post-RA
gentle reminder - thanks!
I notice a lack of any explicitly co_return-related tests and/or code in this patch. I'm just going to assume that is fine.
@jrtc27 just let you know I have same concern too, that's one major reason why we don't upstream those extension on GNU toolchain... we are intend to introduce an internal revision number on ELF attribute in near future, e.g. v-ext 0.9.1 / v0p9p1 to prevent compatible issue here.
- Update test to avoid GlobalISel issue on Windows
- Tighten tests
Thanks for the updates. I think this proposal is ready to send to Chris (Step 4).
(Their problem stems from having 1.0 drafts before they've resolved all the outstanding issues and frozen the instruction set; if they didn't jump the gun then things would be saner for people implementing it)
- Address @craig.topper's comments.
- Update the test cases to use v8-v23 as arguments.
LGTM with some nits. We might want to rewrite these atomics with LLVM intrinsics.
There are a lot of "Resolve for v1.0" issues open against the spec still. Are we sure we want to brand this as 1.0? It will end up as such in the ELF attributes and thus be deemed compatible with future "real" 1.0 binaries.
We could keep the version number as v0.9 or do you think it is better to keep it as v1.020201218.
Droped the forward declaration and rewrote CUDA intrinsics with LLVM instrinsics
Is this dependent on the frame lowering patch to emit the csrr vlenb?