- User Since
- Jan 23 2018, 4:17 PM (232 w, 14 h)
Apr 5 2022
Yep, there is a new dependency which I didn't catch because I use dynamic linking locally. Fixed that quickly but your revert reached first :-)
Apr 4 2022
I assume there are no more comments so far. If you find anything, please let me know and I will fix.
Typo fixed Residal->Residual
Rebase + formatting
Mar 30 2022
Looks good now?
Mar 29 2022
Mar 28 2022
Mar 15 2022
The latest one (4a0481e981b653153fd07db9cbee1197ecd388d3 ) helps. Thanks!
Thanks, I will check tomorrow.
Could you point out to a fixing commit?
Feb 25 2022
Feb 14 2022
The ball is on my side but I can't return to it due to other higher priority tasks. Please let me know if there is a demand from your side so I could re-prioritize this one. Also, as mentioned earlier, the help is more than welcome :-)
I'm extending functionality of memory lowering utilities which meant to be used in different context. At the moment there is no any upstream pass which will take advantage of added functionality immediately. Rather, we have downstream functionality which will make use of it. In future, upstream users can decide to utilize the functionality on case by case bases... I don't have enough expertise in those areas to drive changes
Feb 3 2022
Fixed according to comments
In order for this to work I would need to modify AMDGPULowerIntrinsics pass to provide additional argument(result of ScalarEvolutionAnalysis) to expandMemCpyAsLoop. Currently, this pass doesn't depend on SCEV analysis.
Jan 31 2022
Jan 28 2022
Source and destionation of atomic memcopy may not overlap by sepcification
Jan 25 2022
This change implies that speculation is not beneficial regardless whether block is predicted to be taken or untaken. That essentially means that the optimization is not beneficial in most cases of unpredicted (at compile time) branch as well since hardware will actually predict well in most cases. While there is an explicit note that it's hard to do proper cost modeling on IR level there are still some simple cases which are beneficial. So I think we will regress in some cases.
Jan 24 2022
Don't know either. As I can judge from common logic and implementation of "CallBase::hasReadingOperandBundles()" (especially comment) the same rules should be applied to intrinsics.
Jan 14 2022
Thank you all for your input. The reason why generic argmemonly case (pointed by @nikic) doesn't handle my case is presence of deopt bundle on the call. Currently, "BasicAAResult::getModRefBehavior(const CallBase *Call)" simply ignores function attributes in presence of deopt bundles. By specification: "From the compiler’s perspective, deoptimization operand bundles make the call sites they’re attached to at least readonly. They read through all of their pointer typed operands (even if they’re not otherwise escaped) and the entire visible heap". That has two consequences:
- Current code which special case AnyMemCpyInst (the one I was trying to extend to any memory transfer) is incorrect because it may return NoModRef if memcpy intrinsic has operand bundles. Should it be fixed or just removed (can remove only once the next item is resolved though)?
- It's impossible to describe memcpy/memmove modref behavior with existing encoding defined by FunctionModRefBehavior. In particular, for memmove/memcopy with deopt bundle we need to encode "reads everything, writes inaccessible or argmemonly". Current scheme doesn't support setting different modref for individual type of memory. The best solution which came to me so far is to split available space (currently it's 32-bits given by "enum FunctionModRefBehavior") to 4 regions (4 regions is enough to represent 8 variants defined by "enum class ModRefInfo") and associate each region with specific modref value from ModRefInfo enum. With this approach we can represent any variation for 8 independent memory types in 32-bits which seems to be enough (assuming modref info won't change). In case that turns out to be insufficient can easily double number of supported attributes by switching to 64-bits.
To make it more clear I can give an example. Say we need to encode "reads everything, writes inaccessible or argmemonly". Assume we split our 32-bits by regions in the following way "MustMod | MustRef | Mod | Ref" (each region is 4-bits). Then the value would be "FMRL_Nowhere | FMRL_Nowhere | FMRL_ArgumentPointees_OR_FMRL_InaccessibleMem | FMRL_Anywhere".
Do you think this is reasonable? Anything better?
Anyway I assume it should be discussed on dev list...
Jan 12 2022
Dec 14 2021
Dec 13 2021
Dec 10 2021
Dec 9 2021
I see. Thank you.
Dec 6 2021
Do you (by "doesn't work") mean if executable refers to a global defined in LLVM.*so then each of them will use their own copy?
I don't see what's wrong here (maybe you have specific example). IMHO, quite opposite, LLVM will consistently use it's own instance while user's binary it's own as if there were two globals with different names in the first place. I have exactly that real life case where LLVM's function (coming from libLLVM*.so since functions are not interposed) refers to a global from the user's binary with the same name by completely different semantics.
Dec 3 2021
While -Bsymbolic-funtions brings nice performance improvements it also changes symbol resolution order. That means we effectively disabled preemption for functions and all references from inside libLLVM*.so will be resolved locally. But references to global data can still be interposed by external definitions. Do I understand correctly that main argument against using -Bsymbolic is potential issue with equality comparison of address of global? Or anything else?
Nov 25 2021
Nov 24 2021
Please note we still vectorize epilog with VF 8 while the remainder number of iterations is known to be 1. That should be addressed separately though..
Nov 22 2021
Merge with D112312
Nov 19 2021
I'm totally fine if you commit changes by yourself. I will be able to get to it early next week only.
Nov 15 2021
Nov 11 2021
Nov 7 2021
PS: I think it should be possible to create targeted test case by hands though.
It seems that the new heuristic is applied not only to loops in rotated form. For example "test_cold_loop" or "test13".
In general we should decrease probability to take particular back edge proportionally to number of added exits. For example if number of exits equal to number of predicted iterations then each particular exit should be 50/50. But this is not supported yet anyway..
Oct 20 2021
Oct 19 2021
The fix is sent for review https://reviews.llvm.org/D112060
Oct 18 2021
There was another infinite loop example attached to the commit message for this patch - https://reviews.llvm.org/rG83d134c3c4222e8b8d3d90c099f749a3b3abc8e0
I'm not sure if that is the same root cause, but I don't think there was any reply to that failure.
Oct 1 2021
Sep 30 2021
There is no cost associated with runtime checks for epilogue vectorization. Thus we should simply initialize 'MinProfitableTripCount' with 'unset' value in Epilogue Vectorizer.
Sep 27 2021
Right, while cost of runtime-checks (at the moment) is VF independent we can decide not to do it inside cost model. Though conceptually it looks much more solid design when CostModel is responsible for the cost based decisions. And total code complexity will be even less than doing cost modeling in different places.
Generally LGTM (except few mentioned nits)
Sure let's discuss move to cost model separately. Having sad that first four patches in the series (including D109443) are general improvements independent of the change of cost model itself.
Of cause, there is no strong dependency and D109443 should not block this one.
Sep 26 2021
Honestly, I find current API a bit confusing. I see the following drawbacks in priority:
- Current API is a potential source of future errors since it is easy to misplace a call to Create() or miss it completely. There is no easy way to detect the mistake.
- Negatively affects end user experience if the existing API.
- Requires code changes which could be avoided otherwise.