- User Since
- Dec 12 2018, 5:57 PM (110 w, 6 d)
Change enum names to make them more accurate.
A minor refactor.
Reduce BB status to 3 cases, simplify code based on it.
Mon, Jan 25
Fix a bug when calculating indirect successors.
Sun, Jan 24
Small refactor: Change recursive DFS to iterative DFS when insert tilerelease.
Fix a missing case in function updateBBStatus.
Address Yuanke's comment. Moving the insertion to check AMX block.
Adress Yuanke's comments.
Sat, Jan 23
A minor change.
This patch reduces algorithm complexity of function findNeedInsertBB, while increases a bit code complexity compared with previous one.
A minor refactor.
Remove an unnecessary ++MII;
- Address review comments from Xiang and Yuanke.
- Optimize the algorithm to a single DFS.
- Fix bugs when counting the place for tilerelease.
Thu, Jan 21
Fix Lint warnings.
Wed, Jan 20
I don't have much experience on performance tuning. Adding more people who contributed to the affected tests.
Tue, Jan 19
LGTM, but let's wait for one day or two to see others' opinions.
Mon, Jan 18
Sun, Jan 17
Fri, Jan 15
This doesn't add metadata to llvm intrinsics that are not constrained.
Oh, right. I misunderstood what's doing in these patches and thought we can add metadata to any intrinsics by CGFPOptionsRAII now. :-)
If a relevant #pragma is used then without this change the metadata ...
Yeah, I understand it now. Thank you! But why we still have the wrong maytrap with this patch?
It's true that the middle-end optimizers know nothing about the constrained intrinsics. Today. I plan on changing that in the near future.
If use of the constrained intrinsics will cause breakage of the target-specific clang builtins then that's important to know and to fix.
I had a look at these changes and didn't find anything will cause breakage for now. I'm still not sure if we need to teach target-specific clang builtins to respect strict FP or not. But it has nothing to do with this patch.
Thu, Jan 14
What is the reason for treating this differently in LLVM?
tile config register is the user of each AMX instruction
Wed, Jan 13
Hi Kevin, what's the intention of adding constrained FP metadata for target dependent builtins? I believe the middle-end passes always ignore these builtins. What's more, will it imply user these builtins have different behaviors under different FP model? But it's not true for most platform builtins, since they are just representative of given instructions.
Tue, Jan 12
Mon, Jan 11
LGTM, but I'd like to see if @lebedev.ri has any objections.
Sun, Jan 10
to amx intrinsics the the
Fri, Jan 8
Thu, Jan 7
Is inline assembly the only case emms instruction will be needed? But inline assembly doesn't enable mmx attribute automatically, right? E.g. https://godbolt.org/z/43ases
Analyzing asm block and appending the mmx attribute if we see mmx instructions might be needed. But if we do the analysis, just adding an emms instruction at the end of the block seems better.
Tue, Jan 5
I saw the document mentions ESP/RSP should be preserve for regcall. But I cannot think out a situation that ESP/RSP need to preserve either.
Mon, Jan 4
Wed, Dec 30
Let me know if I am not answering/understanding the questions.
Tue, Dec 29
Mon, Dec 28
Hi Sanjay, pairwise reduction seems not adopted by any targets. But do we need to consider the FMF here? I found LangRef says the intrinsic is sequential for fadd and fmul if reassoc flag is not set.
Hi Simon, I found we have the same problem for fadd/fmul. See https://godbolt.org/z/3YKaGx
X86 Intrinsics imply reassoc flag, but llvm.vector.reduce.* doesn't.
Dec 24 2020
LGTM. Thanks for the refactors. Maybe better to wait for a few days to see if others have objections.
Dec 23 2020
In my test case, it is transformed after Combine redundant instructions.
Dec 22 2020
Dec 21 2020
Dec 19 2020
Dec 17 2020
Dec 16 2020
Dec 15 2020
Dec 14 2020
update_test_prefix.py assumes the conflicting output. You may need to change the expection of it as well.
What's the difference with the existing code? It looks to me that you just brought the warning out of loop, right?
Yes, it is using maxpd/minpd here. Sorry for missing the nan cases.
In fact, I doubt about the behavior when using fast math with target intrinsics. In my opinion, target intrinsics are always associated with given instructions (reduce* are exceptions). So the behavior of intrinsics, e.g. respect nans, signaling, rounding, exception etc. are concordance with their associated instructions. That said the fast math flags won't change the behavior of intrinsics.
Assume above, I'm happy to set these expansions to fast math. Either keeping the existing implementaion or expansion LGTM.
Dec 11 2020
LGTM. Thanks for bringing this refactor.
I also verified that ICC and GCC both do reduce math in an binary tree way, though sometimes ICC has a different LSB from GCC and Clang.
Dec 10 2020
looks like quite a few (3K) of tests in CodeGen (14K or so) use it. So I wanted to understand why removing a prefix used nowhere in the test lead to an incorrect rewrite of some assertions.
There are more then 6K tests. update_llc_test_checks.py is just one of the update scripts.
Add initial author to have a review.
I think you meant --allow-unused-prefixes=true.
Yes, that's what I meant. Thanks for correct me.