User Details
- User Since
- May 3 2019, 8:38 AM (165 w, 3 d)
Wed, Jun 29
LGTM. Thanks!
Tue, Jun 28
could you please add the crashing IR from https://reviews.llvm.org/D123720 as an additional test? It still seems to crash with this patch!
Mon, Jun 27
The description should explain why this change is necessary.
Fri, Jun 24
The cost overflow is a known issue (see https://github.com/llvm/llvm-project/issues/55233). I wonder if specifying a larger cache line size (eg. -cache-line-size=128 or 256 ) would workaround the ubsan failures for now. Otherwise we'd have to fix 55233 before recommitting this, I guess.
Thu, Jun 23
LGTM. Thanks!
The users of WIDEN-INDUCTION in the above recipe are the FIRST-ORDER-RECURRENCE-PHI ir<%prev.011> = phi ir<0>, ir<%indvars15> and EMIT vp<%6> = first-order splice ir<%prev.011> ir<%indvars15>. The first one is a VPFirstOrderRecurrencePHIRecipe and the second one is a VPInstruction, neither of which override usesScalars to return true. It doesn't seem safe to make useScalars return true for these types of recipes, so perhaps we still need to check IV->needsVectorIV()?
Wed, Jun 22
Hi @fhahn , this looks like a regression from your commit below:
Thu, Jun 16
Fri, Jun 10
This option would also be available in llc, so the following in the description isn't quite accurate:
Thu, Jun 9
Thanks for the comment, I see your concern. Would you like me to continue D127342, or would you like me to still keep the cache line size option under LoopCacheCost.cpp as we did in this patch? To avoid breaking computers with no cache, I could update CLS in LoopCacheCost.cpp to the following (I may need to change to wording a little bit) :
CLS = DefaultCacheLineSize.getNumOccurrences() > 0 ? DefaultCacheLineSize : TTI.getCacheLineSize()and pass -cache-line-size=64 to all loop interchange tests. This way the buildbot test failures can be avoided, and in the meantime for a realworld computer with no cache it won't break.
Should the setting be directly be applied in TTI::getCacheLineSize
Wed, Jun 8
https://reviews.llvm.org/D110973 has landed. This should be safe to re-commit now. Thanks!
Some additional comments:
Mon, Jun 6
Jun 3 2022
For the ones that had x86 previously I can reset it to x86, and for the ones that do not have a target triple I can remove it.
Having all tests depend on a single target is not ideal, because this limits the number of bots/configurations that run the tests. I think we need a way to keep most of the tests independent of the PowerPC or any other target and only have tests that explicitly test the cost-modeling depend on the target.
Jun 2 2022
Rebased and added Michael's test case.
Following the discussion in the comments, remove expandPredicationInUnfoldedLoadStore() function (to be added in a follow-up patch)
May 31 2022
LGTM. Thanks!
...With LNICM the load of z is again hoisted into entry, and since entry is the preheader of the original outer loop, after interchange load of z is moved into for.body3.preheader, which is the new outer loop preheader....before this patch, the entry block is not considered part of the loop ...
May 30 2022
May 26 2022
Other possible fixes include modifying the gep instructions in Transforms/LICM/lnicm.ll to make mem accesses delinearizable and analyzable. I'd be fine with any possible solution. I'd appreciate it if you could let me know your thoughts? @bmahjour
May 25 2022
The following crashes even with this patch:
LGTM
May 24 2022
May 19 2022
Other than the inline comments looks good to me.
Ok, LGTM. Thanks!
May 17 2022
IIRC the reason for calling formLCSSARecursively was to make sure we can handle live outs and reductions properly, which we seem to still be able to do (based on LIT test updates). The call to simplifyLoop was to ensure we don't trip any assumptions about loop simplify form when processing the epilogue loop. It may not be an issue now, but would it be a problem, say if a hypothetical utility function common to both paths wants to assert that the loop is in simplified form? Is keeping simplifyLoop() harmless to VPlan's modeling of exit values?
May 16 2022
Although the problem reported in PR55233 is worsened by D123400, the underlying issue existed even before that (as illustrated by the example in PR55233), so this patch won't completely solve the overflow problem, although it would make it a bit less likely to occur. I think a proper solution to PR55233 would need to address the overflow problem itself (eg by increasing the range of values that the cost model can represent).
May 13 2022
Fix test2 to make sure when D71539 is applied, it fails without this patch and passes with it.
May 5 2022
May 4 2022
May 3 2022
Used getLoopDisposition and improved tests.
@Meinersbur Thanks for the test case that fails without D71539. The test case I reported was bugpoint reduced and that's why the undefs are there. I'll update D110973 with a test that doesn't use undef and that passes today but would fail with D71539. I'll also add your test to it.
LGTM
May 2 2022
Sorry for the delay, as I didn't get a chance to commit this before I went on vacation.
Apr 20 2022
Added a test for the multi-store case I mentioned above.
This has been superseded by https://reviews.llvm.org/D71539
Apr 19 2022
Apr 18 2022
What I did is to take the stride into account as a second component, and for each loop take the maximum stride of all reference groups to get the final stride, which presumably could resolve the motivating problem too.
Apr 11 2022
@nikic it may be a good idea to have a discussion about this (impact of opaque pointers on delinearization) at one of the upcoming LoopOptWG calls. If you care to join us, please let me know or add an agenda item to https://docs.google.com/document/d/1sdzoyB11s0ccTZ3fobqctDpgJmRoFcz0sviKxqczs4g/edit so we can discuss this in more depth. Thanks!
You mention that the SCEV-based approach cannot handle fixed-sized arrays. Could you please explain in more detail why? It's not really obvious to me.
The cleanest way I can think of to teach LoopVectorizer about this would be to introduce a whole new set of composite reduction operations of the form <op>-then-<lop> (eg RecurKind::AddThenAnd, RecurKind::MulThenAnd, RecurKind::OrThenAnd, and so on)...and that's just for combining logical and with the known integer reduction ops, so if we want to support e.g. or we'd need to double the number of additional recurrence kinds (and the extra logic that comes with it) again. The identity value would be determined from the <op>, and the <lop> has to be applied when reducing the final vector into a single scalar upon loop exit.
Apr 8 2022
Thanks for looking into this @congzhe . As you described we don't seem to be able to distinguish between the cost of moving outer loops in nests more than 2 levels deep. The data locality paper tries to estimate the cost based on the estimated number of cache lines used when moving a loop into innermost level. Since the stride information (when it's less than the cache line size) is already used in RefCost functions, it makes more sense to somehow integrate it into the cost formula when it's larger than the cache line size. Treating stride as a separate component of the cost and associating it with the loop (as opposed to each reference group) makes the result of the analysis more complicated to consume and can cause ambiguity when dealing with multiple reference groups.
Apr 1 2022
LGTM. Thanks.
Mar 9 2022
With the recent updates we are no longer losing test coverage. Per discussion in the Loop Opt WG call, I'll approve to get this in as a small incremental improvement. I'll look into https://github.com/llvm/llvm-project/issues/54176 and follow up with more patches.
Mar 7 2022
Mar 3 2022
Mar 2 2022
Feb 25 2022
I'm generally in favour of this patch, but just not sure if there is (or should be) a guarantee that SCEV will NOT produce an AddRec in overflow situations. For example when a GEP has the inbounds attribute, the produced address after adding the index/offset cannot cause an unsigned overflow, so not sure why SCEV would produce different results depending on whether the index/offset could wrap! SCEV seems to produce an AddRec for the offset/index (eg. %i.add) regardless whether it could wrap or not. It's just the GEP that seems to be behaving the way you described.
Feb 23 2022
Thanks for this @Meinersbur . Could we replace "https://ibm.webex.com/ibm/j.php?MTID=mad5dd5ca19e0423759b0c507245e4ccc" everywhere it occurs with "https://ibm.webex.com/ibm/j.php?MTID=m450e0c4009445e16df43ff82ea58f7a6", so that the meeting link is consistent with the info in the google doc.
Feb 8 2022
Even though the initial intention of the "hack" might have been to prevent predication on some X86 subtargets, it was added in a way that affected all targets, so it could expose cost model issues on various targets if suddenly removed. I wasn't aware of this and just noticed it....has any performance runs been done on other targets (eg. Power)?
Feb 4 2022
Feb 1 2022
Apart from some minor inline comments this revision addresses all my outstanding comments. LGTM.
Jan 31 2022
Couldn't the same problem happen in theory without control flow divergence? For example consider a loop like this:
Jan 27 2022
Jan 17 2022
Jan 14 2022
This commit somehow causes the "unroll.disable" metadata to be dropped from the vector epilogue loops. For example: