Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Phabricator shutdown timeline

bmahjour (Bardia Mahjour)
User

Projects

User does not belong to any projects.

User Details

User Since
May 3 2019, 8:38 AM (230 w, 3 d)

Recent Activity

May 15 2023

bmahjour added a reviewer for D150120: [RFC] Vector math function loop idiom recognition: masoud.ataei.
May 15 2023, 6:32 AM · Restricted Project, Restricted Project, Restricted Project

Jan 10 2023

bmahjour accepted D135808: [LoopInterchange] Correcting the profitability check.

LGTM with some minor comments.

Jan 10 2023, 2:34 PM · Restricted Project, Restricted Project, Restricted Project

Dec 15 2022

bmahjour added inline comments to D135808: [LoopInterchange] Correcting the profitability check.
Dec 15 2022, 12:29 PM · Restricted Project, Restricted Project, Restricted Project

Nov 16 2022

bmahjour accepted D137461: [LoopInterchange] Refactor and rewrite validDepInterchange().

The two changes (Direction == '*' and checking the dep vector before doing the interchange) may add safety, but also cause tests to fail.

@Narutoworld your latest patch calls isLexicographicallyPositive once before the swap and once afterwards, and I don't see any XFAILed tests in your update. Have you check to see if all LIT tests pass?

Hi @bmahjour
I have run the whole LIT tests with ninja check-all. No failed tests found.

Nov 16 2022, 12:04 PM · Restricted Project, Restricted Project, Restricted Project
bmahjour added a comment to D137461: [LoopInterchange] Refactor and rewrite validDepInterchange().

The two changes (Direction == '*' and checking the dep vector before doing the interchange) may add safety, but also cause tests to fail.

Nov 16 2022, 11:38 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour added a comment to D137461: [LoopInterchange] Refactor and rewrite validDepInterchange().

Think of the * being in another function that calls a function containing the < < loops.

Nov 16 2022, 11:35 AM · Restricted Project, Restricted Project, Restricted Project

Nov 15 2022

bmahjour added inline comments to D137461: [LoopInterchange] Refactor and rewrite validDepInterchange().
Nov 15 2022, 2:15 PM · Restricted Project, Restricted Project, Restricted Project
bmahjour added a comment to D137461: [LoopInterchange] Refactor and rewrite validDepInterchange().

This is definitely an improvement and I'm glad to see this change, but please see my inline comments.

Nov 15 2022, 2:13 PM · Restricted Project, Restricted Project, Restricted Project

Nov 2 2022

bmahjour added a project to D136415: [LSR] Check if terminating value is safe to expand before transformation: Restricted Project.
Nov 2 2022, 8:51 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour added a comment to D135808: [LoopInterchange] Correcting the profitability check.

This looks better now, but the problem of "endless interchange" is still not addressed. Per discussion in the loop opt call, we should only use CacheCost when it was able to calculate a valid cost and the loops can be sorted based on "strictly less than" ordering relationship. Only when CacheCost result is indeterminate (eg. two loops have equal cost) or when it is unable to calculate the cost (eg. due to delinearization issues), we should fall back to the isProfitableForVectorization() function.

Nov 2 2022, 7:54 AM · Restricted Project, Restricted Project, Restricted Project

Nov 1 2022

bmahjour added a comment to D134930: [LoopInterchange] Do not interchange when a reduction phi in all subloops of the outer loop is not recognizable.

The current output from loop interchange is wrong, but it could be corrected if we move the vector.reduce and the store to the for.inc19.i block; ie:

middle.block:                                     ; preds = %for.inc19.i
  %inc17.i = add nuw nsw i16 %j.010.i, 1
  %exitcond12.not.i = icmp eq i16 %inc17.i, 4
  br i1 %exitcond12.not.i, label %test.exit, label %for.j
Nov 1 2022, 1:31 PM · Restricted Project, Restricted Project
bmahjour requested changes to D136277: [LoopInterchange] Simplify DepMatrix to a dependency vector..

This would collapse two dimensions of the dependency matrix into one without providing a way to recover the individual dependency vectors, and as you pointed out in the description it could make some interchangeable cases look illegal. Another example is if we have [S, >, = ] and [S, <, =], we can safely interchange the two inner loops (by factoring out the S from the beginning of both vectors), but after merging them we get [S, <>, = ] and now the inner two loops can no longer be interchanged.

Nov 1 2022, 11:21 AM · Restricted Project, Restricted Project

Oct 18 2022

bmahjour added a comment to D135808: [LoopInterchange] Correcting the profitability check.

If outer loop dependency direction "=" and Inner loop dependency direction is "S" and "I" then, loop interchange is considered as profitable. Only two cases of dependency is profitable. But for vectorization, ">" and "<" dependency in outer loop is more profitable when interchanged. After patch [=,<] and [=,>] will be interchanged for vectorization.

I thought what you meant is that after this patch, [<, =] and [>, =] (not [=,<] and [=,>]) will be interchanged? Because after interchange the dependency vector would become [=, <] and [=, >] respectively, which could improve potential parallelization and enable finer-grained parallelism, i.e., outer loop parallelism instead of inner loop parallelism. I think this is what isProfitableForVectorization() is supposed to be.

I wonder if it makes sense to you @bmahjour ?

Oct 18 2022, 7:18 AM · Restricted Project, Restricted Project, Restricted Project

Oct 17 2022

bmahjour added a comment to D135808: [LoopInterchange] Correcting the profitability check.

The proper profitability analysis for loop interchange uses CacheCost, although there are cases where it may be unable to analyze certain accesses (eg due to delinearization issues). It would be good to understand why we don't go through the CacheCost path in your use case.

Oct 17 2022, 6:38 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour added a project to D135808: [LoopInterchange] Correcting the profitability check: Restricted Project.
Oct 17 2022, 6:10 AM · Restricted Project, Restricted Project, Restricted Project

Oct 11 2022

bmahjour added a comment to D135451: [TTI] New PPC target hook enableUncondDivisionSpeculation.

As I understand it integer divide by zero is considered undefined behaviour in C while the same may not be true in other languages. Furthermore presence of side effects may be dependent on other configurations including target hardware (eg default treatment on Power vs x86).

The LLVM IR rule is more driven by the behavior of the instructions on various targets. If a target only has a trapping divide, we'd need to wrap it in control flow to implement a non-trapping divide. And particularly for signed divide, that check isn't cheap. We tend to prefer poison where it makes sense (for example, out-of-bounds shifts).

Frontends can always use control flow to get whatever user-visible behavior they want. (For example, the Rust divide operator panics if you divide by zero.)

To allow more flexibility we could either leave the IR neutral and let the optimizer decide based on config info (eg. TTI)

We try to avoid making core IR semantics depend on TTI. Not that we can completely ignore target differences when writing IR optimizations, but we want to keep IR understandable without reference to target-specific semantics.

I mean, it would be self-consistent to write in LangRef something like "whether division by zero is undefined behavior, or defined to produce a poison value, depends on the current target/current subtarget/bitwith of the operation/current moon cycle". But I don't want to go there. If the rules are the same across all targets, it's easier to understand, and easier to implement tools like Alive2 to validate transforms.

Oct 11 2022, 8:04 AM · Restricted Project, Restricted Project

Oct 7 2022

bmahjour added a comment to D135451: [TTI] New PPC target hook enableUncondDivisionSpeculation.

IIUC this proposal would effectively re-define udiv and urem's semantics on the IR level to not have undefined behavior for PPC?

I don't think that's quite correct. We still view them as undefined,

division by zero is currently undefined behavior at the IR level; if your program would execute it, it has no meaning at all. So hoisting a divide will interact badly with other optimizations; for example, instcombine will currently turn a divide by zero into "unreachable". This is different from instructions that return poison.

If you want a version of division that returns a poison value, you need to modify the semantics in LangRef.

As I understand it integer divide by zero is considered undefined behaviour in C while the same may not be true in other languages. Furthermore presence of side effects may be dependent on other configurations including target hardware (eg default treatment on Power vs x86). LLVM seems to distinguish the concept of "undefined behaviour" from "undefined value", treating the former more consequential than the latter. It currently treats div-by-zero as undefined behaviour, but that may be an over pessimistic treatment as demonstrated in this review. Baking assumptions about the source language or target hardware in the LLVM IR gets us into the situation were we have to sacrifice performance for some combinations to ensure functional correctness for others. To allow more flexibility we could either leave the IR neutral and let the optimizer decide based on config info (eg. TTI) or separate the undefined behaviour/value semantics from the udiv/urem instructions (eg. using an instruction flag). I think this revision takes the first approach and what you are suggesting is the second. I agree the second approach is cleaner and might be necessary given the historic assumptions made in this regard, although it would be a larger effort.

Oct 7 2022, 2:32 PM · Restricted Project, Restricted Project

Sep 21 2022

bmahjour accepted D132055: [LoopInterchange][PR57148] Ensure LCSSA form after loop interchnange.

LGTM

Sep 21 2022, 8:18 AM · Restricted Project, Restricted Project, Restricted Project

Aug 22 2022

bmahjour added inline comments to D131606: [Loop Fusion] Sink/hoist memory instructions between loop fusion candidates.
Aug 22 2022, 9:55 AM · Restricted Project, Restricted Project
bmahjour added inline comments to D132055: [LoopInterchange][PR57148] Ensure LCSSA form after loop interchnange.
Aug 22 2022, 9:53 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour added inline comments to D132199: [LoopPassManager] Ensure to construct loop nests with the outermost loop.
Aug 22 2022, 9:52 AM · Restricted Project, Restricted Project, Restricted Project

Aug 3 2022

bmahjour accepted D130189: [LoopInterchange][PR56275] Fix legality in dependence vectors.
Aug 3 2022, 7:08 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour accepted D130188: [DependenceAnalysis][PR56275] Normalize dependence analysis results to be non-negative when required.
Aug 3 2022, 7:07 AM · Restricted Project, Restricted Project, Restricted Project

Jul 27 2022

bmahjour added inline comments to D130188: [DependenceAnalysis][PR56275] Normalize dependence analysis results to be non-negative when required.
Jul 27 2022, 6:51 AM · Restricted Project, Restricted Project, Restricted Project

Jul 25 2022

bmahjour added inline comments to D130188: [DependenceAnalysis][PR56275] Normalize dependence analysis results to be non-negative when required.
Jul 25 2022, 9:47 AM · Restricted Project, Restricted Project, Restricted Project

Jul 21 2022

bmahjour added inline comments to D130188: [DependenceAnalysis][PR56275] Normalize dependence analysis results to be non-negative when required.
Jul 21 2022, 1:21 PM · Restricted Project, Restricted Project, Restricted Project
bmahjour added a comment to D130188: [DependenceAnalysis][PR56275] Normalize dependence analysis results to be non-negative when required.

Since the normalize() function is conceptually an operation that applies to the result (as opposed to the analysis itself), it makes more sense for it to be a member of the Dependence class. Client code, such as loop interchange, would then do something like this:

Jul 21 2022, 1:15 PM · Restricted Project, Restricted Project, Restricted Project

Jun 29 2022

bmahjour accepted D128653: [PowerPC] Fix the check for scalar MASS conversion.

LGTM. Thanks!

Jun 29 2022, 2:40 PM · Restricted Project, Restricted Project, Restricted Project, Restricted Project
bmahjour added a comment to D128755: [VPlan] Make sure optimizeInductions removes wide ind from scalar plan..

Address latest comments, thanks!

could you please add the crashing IR from https://reviews.llvm.org/D123720 as an additional test? It still seems to crash with this patch!

Is it still crashing with this patch applied to current main? It needs b18141a8f29f638be0602aa6ffbfc2d434b0b74a as well.

Jun 29 2022, 9:18 AM · Restricted Project, Restricted Project

Jun 28 2022

bmahjour added a comment to D128755: [VPlan] Make sure optimizeInductions removes wide ind from scalar plan..

could you please add the crashing IR from https://reviews.llvm.org/D123720 as an additional test? It still seems to crash with this patch!

Jun 28 2022, 3:39 PM · Restricted Project, Restricted Project
bmahjour added a comment to D123720: [VPlan] Replace use of needsVectorIV with VPlan user check..

The users of WIDEN-INDUCTION in the above recipe are the FIRST-ORDER-RECURRENCE-PHI ir<%prev.011> = phi ir<0>, ir<%indvars15> and EMIT vp<%6> = first-order splice ir<%prev.011> ir<%indvars15>. The first one is a VPFirstOrderRecurrencePHIRecipe and the second one is a VPInstruction, neither of which override usesScalars to return true. It doesn't seem safe to make useScalars return true for these types of recipes, so perhaps we still need to check IV->needsVectorIV()?

Note this is also related to rG85983ca42ec6102b1692d0451090cacd002c958f, so simply reverting the above commit won't fix the case I quoted above.

Thanks, I put up D128755 to fix it by making sure that we replace all widen-inductions with scalar steps for plans with scalar VFs.

Jun 28 2022, 3:38 PM · Restricted Project, Restricted Project
bmahjour added inline comments to D128755: [VPlan] Make sure optimizeInductions removes wide ind from scalar plan..
Jun 28 2022, 3:36 PM · Restricted Project, Restricted Project
bmahjour added inline comments to D128653: [PowerPC] Fix the check for scalar MASS conversion.
Jun 28 2022, 2:46 PM · Restricted Project, Restricted Project, Restricted Project, Restricted Project

Jun 27 2022

bmahjour added inline comments to D128653: [PowerPC] Fix the check for scalar MASS conversion.
Jun 27 2022, 2:17 PM · Restricted Project, Restricted Project, Restricted Project, Restricted Project
bmahjour added a comment to D128653: [PowerPC] Fix the check for scalar MASS conversion.

The description should explain why this change is necessary.

Jun 27 2022, 9:43 AM · Restricted Project, Restricted Project, Restricted Project, Restricted Project

Jun 24 2022

bmahjour added a comment to D124926: [LoopInterchange] New cost model for loop interchange.

The cost overflow is a known issue (see https://github.com/llvm/llvm-project/issues/55233). I wonder if specifying a larger cache line size (eg. -cache-line-size=128 or 256 ) would workaround the ubsan failures for now. Otherwise we'd have to fix 55233 before recommitting this, I guess.

Jun 24 2022, 7:45 AM · Restricted Project, Restricted Project, Restricted Project

Jun 23 2022

bmahjour accepted D124926: [LoopInterchange] New cost model for loop interchange.

LGTM. Thanks!

Jun 23 2022, 10:20 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour added a comment to D123720: [VPlan] Replace use of needsVectorIV with VPlan user check..

The users of WIDEN-INDUCTION in the above recipe are the FIRST-ORDER-RECURRENCE-PHI ir<%prev.011> = phi ir<0>, ir<%indvars15> and EMIT vp<%6> = first-order splice ir<%prev.011> ir<%indvars15>. The first one is a VPFirstOrderRecurrencePHIRecipe and the second one is a VPInstruction, neither of which override usesScalars to return true. It doesn't seem safe to make useScalars return true for these types of recipes, so perhaps we still need to check IV->needsVectorIV()?

Jun 23 2022, 9:40 AM · Restricted Project, Restricted Project

Jun 22 2022

bmahjour added a comment to D123720: [VPlan] Replace use of needsVectorIV with VPlan user check..

Hi @fhahn , this looks like a regression from your commit below:

Jun 22 2022, 12:17 PM · Restricted Project, Restricted Project

Jun 16 2022

bmahjour accepted D127342: [TargetTransformInfo] Added an option for the cache line size.
Jun 16 2022, 11:37 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour added inline comments to D127342: [TargetTransformInfo] Added an option for the cache line size.
Jun 16 2022, 10:49 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour added a reviewer for D127342: [TargetTransformInfo] Added an option for the cache line size: sfertile.
Jun 16 2022, 10:48 AM · Restricted Project, Restricted Project, Restricted Project

Jun 10 2022

bmahjour added inline comments to D127342: [TargetTransformInfo] Added an option for the cache line size.
Jun 10 2022, 11:41 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour added a comment to D127342: [TargetTransformInfo] Added an option for the cache line size.

This option would also be available in llc, so the following in the description isn't quite accurate:

Jun 10 2022, 10:44 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour added inline comments to D127342: [TargetTransformInfo] Added an option for the cache line size.
Jun 10 2022, 10:38 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour added a project to D125847: LTO: Add option to initialize with opaque/non-opaque pointer types: Unknown Object (Project).
Jun 10 2022, 8:18 AM · Unknown Object (Project), Restricted Project, Restricted Project, Restricted Project
bmahjour added inline comments to D125847: LTO: Add option to initialize with opaque/non-opaque pointer types.
Jun 10 2022, 8:18 AM · Unknown Object (Project), Restricted Project, Restricted Project, Restricted Project

Jun 9 2022

bmahjour added a comment to D124926: [LoopInterchange] New cost model for loop interchange.

Thanks for the comment, I see your concern. Would you like me to continue D127342, or would you like me to still keep the cache line size option under LoopCacheCost.cpp as we did in this patch? To avoid breaking computers with no cache, I could update CLS in LoopCacheCost.cpp to the following (I may need to change to wording a little bit) :

CLS = DefaultCacheLineSize.getNumOccurrences() > 0  ? DefaultCacheLineSize : TTI.getCacheLineSize()

and pass -cache-line-size=64 to all loop interchange tests. This way the buildbot test failures can be avoided, and in the meantime for a realworld computer with no cache it won't break.

Jun 9 2022, 1:08 PM · Restricted Project, Restricted Project, Restricted Project
bmahjour added inline comments to D127342: [TargetTransformInfo] Added an option for the cache line size.
Jun 9 2022, 9:50 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour added a comment to D124926: [LoopInterchange] New cost model for loop interchange.

Should the setting be directly be applied in TTI::getCacheLineSize

Jun 9 2022, 9:48 AM · Restricted Project, Restricted Project, Restricted Project

Jun 8 2022

bmahjour added a comment to D71539: [SCEV] Look through trivial PHIs..

https://reviews.llvm.org/D110973 has landed. This should be safe to re-commit now. Thanks!

Jun 8 2022, 1:45 PM · Restricted Project, Restricted Project
bmahjour added a comment to D110973: [DA] Handle mismatching loop levels by considering them non-linear.

Thanks for pushing the fix through! I think this should unblock D71539 now.

Jun 8 2022, 1:40 PM · Restricted Project, Restricted Project
bmahjour added a comment to D124926: [LoopInterchange] New cost model for loop interchange.

Some additional comments:

Jun 8 2022, 8:43 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour committed rGc9677f6db4b2: [DA] Handle mismatching loop levels by considering them non-linear (authored by bmahjour).
[DA] Handle mismatching loop levels by considering them non-linear
Jun 8 2022, 8:16 AM · Restricted Project, Restricted Project
bmahjour closed D110973: [DA] Handle mismatching loop levels by considering them non-linear.
Jun 8 2022, 8:16 AM · Restricted Project, Restricted Project

Jun 6 2022

bmahjour added a comment to D123408: [InstCombine] Limit folding of cast into PHI.

The cleanest way I can think of to teach LoopVectorizer about this would be to introduce a whole new set of composite reduction operations of the form <op>-then-<lop> (eg RecurKind::AddThenAnd, RecurKind::MulThenAnd, RecurKind::OrThenAnd, and so on)...and that's just for combining logical and with the known integer reduction ops, so if we want to support e.g. or we'd need to double the number of additional recurrence kinds (and the extra logic that comes with it) again. The identity value would be determined from the <op>, and the <lop> has to be applied when reducing the final vector into a single scalar upon loop exit.

@lebedev.ri is this what you had in mind or is there a better way to do it?

@lebedev.ri Can you please advise if the above described way is how we would implement this within the LoopVectorizer?

Sorry, lost track here. I'm not familiar enough with LV to recommend the solution,
but it sounds vaguely reasonable to me. But, do you need the whole <op>-then-<lop> generality?
The only reason why <op>-then-and is useful, is because that and specifies
the effective bitwidth of the reduction, but if the high bits aren't demanded arithmetic/logic ops can be losslessly performed in narrower bit widths:
i32 65535 + i32 65535 = i32 131070 = 0x1FFFE, (trunc(i32 65535) to i8) + (trunc(i32 65535) to i8) = i8 510 = 0xFE, note how low 8 bits are the same.
Perhaps the solution should be around tracking the demanded bit width?

Jun 6 2022, 10:44 AM · Restricted Project, Restricted Project

Jun 3 2022

bmahjour added a comment to D124926: [LoopInterchange] New cost model for loop interchange.

For the ones that had x86 previously I can reset it to x86, and for the ones that do not have a target triple I can remove it.

Jun 3 2022, 7:58 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour added a comment to D124926: [LoopInterchange] New cost model for loop interchange.

Having all tests depend on a single target is not ideal, because this limits the number of bots/configurations that run the tests. I think we need a way to keep most of the tests independent of the PowerPC or any other target and only have tests that explicitly test the cost-modeling depend on the target.

Jun 3 2022, 7:15 AM · Restricted Project, Restricted Project, Restricted Project

Jun 2 2022

bmahjour added inline comments to D110973: [DA] Handle mismatching loop levels by considering them non-linear.
Jun 2 2022, 1:35 PM · Restricted Project, Restricted Project
bmahjour updated the diff for D110973: [DA] Handle mismatching loop levels by considering them non-linear.

Rebased and added Michael's test case.

Jun 2 2022, 1:34 PM · Restricted Project, Restricted Project
bmahjour added a comment to D109584: [VP] Implementing expansion pass for VP load and store..

Following the discussion in the comments, remove expandPredicationInUnfoldedLoadStore() function (to be added in a follow-up patch)

Jun 2 2022, 7:19 AM · Restricted Project, Restricted Project, Restricted Project

May 31 2022

bmahjour accepted D124926: [LoopInterchange] New cost model for loop interchange.
May 31 2022, 12:47 PM · Restricted Project, Restricted Project, Restricted Project
bmahjour added a comment to D124926: [LoopInterchange] New cost model for loop interchange.

LGTM. Thanks!

May 31 2022, 12:47 PM · Restricted Project, Restricted Project, Restricted Project
bmahjour added a comment to D124926: [LoopInterchange] New cost model for loop interchange.

...With LNICM the load of z is again hoisted into entry, and since entry is the preheader of the original outer loop, after interchange load of z is moved into for.body3.preheader, which is the new outer loop preheader....before this patch, the entry block is not considered part of the loop ...

May 31 2022, 11:06 AM · Restricted Project, Restricted Project, Restricted Project

May 30 2022

bmahjour added inline comments to D124926: [LoopInterchange] New cost model for loop interchange.
May 30 2022, 9:24 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour added inline comments to D124926: [LoopInterchange] New cost model for loop interchange.
May 30 2022, 9:13 AM · Restricted Project, Restricted Project, Restricted Project

May 26 2022

bmahjour added a comment to D124926: [LoopInterchange] New cost model for loop interchange.

Other possible fixes include modifying the gep instructions in Transforms/LICM/lnicm.ll to make mem accesses delinearizable and analyzable. I'd be fine with any possible solution. I'd appreciate it if you could let me know your thoughts? @bmahjour

May 26 2022, 1:53 PM · Restricted Project, Restricted Project, Restricted Project

May 25 2022

bmahjour updated the diff for D110973: [DA] Handle mismatching loop levels by considering them non-linear.
May 25 2022, 2:13 PM · Restricted Project, Restricted Project
bmahjour added a comment to D110973: [DA] Handle mismatching loop levels by considering them non-linear.

The following crashes even with this patch:

May 25 2022, 2:12 PM · Restricted Project, Restricted Project
bmahjour accepted D124984: [NFC][LoopCacheAnalysis] Update test cases to make sure the outputs follow the right order.

LGTM

May 25 2022, 10:09 AM · Restricted Project, Restricted Project, Restricted Project

May 24 2022

bmahjour added a comment to D124984: [NFC][LoopCacheAnalysis] Update test cases to make sure the outputs follow the right order.

Hi Bardia @bmahjour, after some thoughts I think regarding this patch we would chase a longer-term solution. For now I think I could rework and change this patch (both title and content) to an NFC patch that only updates the test cases, i.e., from using CHECK to CHECK-NEXT, since we really want to make sure that the output is ordered and the order is correct. I'm wondering how you think about it?

May 24 2022, 12:56 PM · Restricted Project, Restricted Project, Restricted Project

May 19 2022

bmahjour added a comment to D124926: [LoopInterchange] New cost model for loop interchange.

Other than the inline comments looks good to me.

May 19 2022, 7:45 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour accepted D125810: [LV] Do not LoopSimplify/LCSSA after generating main vector loop..

Ok, LGTM. Thanks!

May 19 2022, 7:02 AM · Restricted Project, Restricted Project

May 17 2022

bmahjour added a comment to D125810: [LV] Do not LoopSimplify/LCSSA after generating main vector loop..

IIRC the reason for calling formLCSSARecursively was to make sure we can handle live outs and reductions properly, which we seem to still be able to do (based on LIT test updates). The call to simplifyLoop was to ensure we don't trip any assumptions about loop simplify form when processing the epilogue loop. It may not be an issue now, but would it be a problem, say if a hypothetical utility function common to both paths wants to assert that the loop is in simplified form? Is keeping simplifyLoop() harmless to VPlan's modeling of exit values?

May 17 2022, 1:43 PM · Restricted Project, Restricted Project

May 16 2022

bmahjour added a comment to D124984: [NFC][LoopCacheAnalysis] Update test cases to make sure the outputs follow the right order.

Although the problem reported in PR55233 is worsened by D123400, the underlying issue existed even before that (as illustrated by the example in PR55233), so this patch won't completely solve the overflow problem, although it would make it a bit less likely to occur. I think a proper solution to PR55233 would need to address the overflow problem itself (eg by increasing the range of values that the cost model can represent).

May 16 2022, 10:02 AM · Restricted Project, Restricted Project, Restricted Project

May 13 2022

bmahjour updated the diff for D110973: [DA] Handle mismatching loop levels by considering them non-linear.

Fix test2 to make sure when D71539 is applied, it fails without this patch and passes with it.

May 13 2022, 2:15 PM · Restricted Project, Restricted Project
bmahjour added a comment to D110973: [DA] Handle mismatching loop levels by considering them non-linear.

I was expecting that we only need to call getLoopDisposition once at the beginning of checkSubscript and then decide depending on whether it is a SCEVAddRecExpr, instead of

if (!AddRec)
  return isLoopInvariant(Expr, LoopNest);

Is this possible?

May 13 2022, 12:42 PM · Restricted Project, Restricted Project

May 5 2022

bmahjour added a project to D75628: [DA] [SCEV] Provide facility to check for total ordering based on dominance: Restricted Project.
May 5 2022, 12:21 PM · Restricted Project, Restricted Project, Restricted Project

May 4 2022

bmahjour added inline comments to D110973: [DA] Handle mismatching loop levels by considering them non-linear.
May 4 2022, 12:06 PM · Restricted Project, Restricted Project

May 3 2022

bmahjour updated the diff for D110973: [DA] Handle mismatching loop levels by considering them non-linear.

Used getLoopDisposition and improved tests.

May 3 2022, 3:19 PM · Restricted Project, Restricted Project
bmahjour abandoned D110972: [DA] Handle mismatching loop levels by updating the numbering scheme.
May 3 2022, 3:16 PM · Restricted Project, Restricted Project
bmahjour added a comment to D110972: [DA] Handle mismatching loop levels by updating the numbering scheme.

@Meinersbur Thanks for the test case that fails without D71539. The test case I reported was bugpoint reduced and that's why the undefs are there. I'll update D110973 with a test that doesn't use undef and that passes today but would fail with D71539. I'll also add your test to it.

May 3 2022, 3:16 PM · Restricted Project, Restricted Project
bmahjour accepted D122776: [NFC][LoopCacheAnalysis] Add a motivating test case for improved loop cache analysis cost calculation.

LGTM

May 3 2022, 9:08 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour added inline comments to D122776: [NFC][LoopCacheAnalysis] Add a motivating test case for improved loop cache analysis cost calculation.
May 3 2022, 7:20 AM · Restricted Project, Restricted Project, Restricted Project

May 2 2022

bmahjour committed rG363b3a645a1e: fix warning caused by ef4ecc3ceffcf3ef129640c813f823c974f9ba22 (authored by bmahjour).
fix warning caused by ef4ecc3ceffcf3ef129640c813f823c974f9ba22
May 2 2022, 2:09 PM · Restricted Project, Restricted Project
bmahjour committed rGef4ecc3ceffc: [LoopCacheAnalysis] Consider dimension depth of the subscript reference when… (authored by bmahjour).
[LoopCacheAnalysis] Consider dimension depth of the subscript reference when…
May 2 2022, 1:51 PM · Restricted Project, Restricted Project
bmahjour closed D123400: [LoopCacheAnalysis] Consider dimension depth of the subscript reference when calculating cost.
May 2 2022, 1:51 PM · Restricted Project, Restricted Project, Restricted Project
bmahjour updated the diff for D123400: [LoopCacheAnalysis] Consider dimension depth of the subscript reference when calculating cost.

Sorry for the delay, as I didn't get a chance to commit this before I went on vacation.

May 2 2022, 11:55 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour added inline comments to D124745: [Delinearization] Refactoring of fixed-size array delinearization.
May 2 2022, 8:40 AM · Restricted Project, Restricted Project, Restricted Project

Apr 20 2022

bmahjour updated the diff for D123400: [LoopCacheAnalysis] Consider dimension depth of the subscript reference when calculating cost.

Added a test for the multi-store case I mentioned above.

Apr 20 2022, 9:39 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour added a comment to D71539: [SCEV] Look through trivial PHIs..

The last known crash has been fixed by now and I plan to re-commit this soon.

AFAIK the only remaining issue is with DependenceAnalysis. @bmahjour do you know if there has been any progress on the fixes you shared a while ago?

Apr 20 2022, 9:20 AM · Restricted Project, Restricted Project
bmahjour abandoned D71492: [SCEV] Generate AddRec for trivial and LCSSA phis outside of loop header (WIP).

This has been superseded by https://reviews.llvm.org/D71539

Apr 20 2022, 9:14 AM · Restricted Project, Restricted Project
bmahjour added a reviewer for D110973: [DA] Handle mismatching loop levels by considering them non-linear: artemrad.
Apr 20 2022, 9:12 AM · Restricted Project, Restricted Project
bmahjour added a reviewer for D110972: [DA] Handle mismatching loop levels by updating the numbering scheme: artemrad.
Apr 20 2022, 9:12 AM · Restricted Project, Restricted Project
bmahjour edited reviewers for D110973: [DA] Handle mismatching loop levels by considering them non-linear, added: fhahn; removed: Florian.
Apr 20 2022, 8:54 AM · Restricted Project, Restricted Project
bmahjour edited reviewers for D110972: [DA] Handle mismatching loop levels by updating the numbering scheme, added: fhahn; removed: Florian.
Apr 20 2022, 8:54 AM · Restricted Project, Restricted Project
bmahjour added reviewers for D110973: [DA] Handle mismatching loop levels by considering them non-linear: Meinersbur, Florian, Restricted Project.
Apr 20 2022, 8:49 AM · Restricted Project, Restricted Project
bmahjour added reviewers for D110972: [DA] Handle mismatching loop levels by updating the numbering scheme: Meinersbur, Florian, Restricted Project.
Apr 20 2022, 8:49 AM · Restricted Project, Restricted Project

Apr 19 2022

bmahjour added a comment to D123400: [LoopCacheAnalysis] Consider dimension depth of the subscript reference when calculating cost.

What I did is to take the stride into account as a second component, and for each loop take the maximum stride of all reference groups to get the final stride, which presumably could resolve the motivating problem too.

Treating stride as a secondary component is what I respectfully objected to, and explained earlier. Not sure if taking the maximum stride would give us what we need. For example consider

for (i)
  for (j)
    for (k)
       ... A[i][j][k]
       ... B[i][k][j]
       ... C[i][k][j]

the maximum stride will be the same for both i-j-k and i-k-j configurations (despite the second one being more profitable) bringing us back to the original problem.

IMHO for this case the cost of loop-k would be higher than loop-j (remember that we compare the cost first and then stride). So loop cache analysis does output the i-k-j pattern.

Apr 19 2022, 1:34 PM · Restricted Project, Restricted Project, Restricted Project

Apr 18 2022

bmahjour added a comment to D123400: [LoopCacheAnalysis] Consider dimension depth of the subscript reference when calculating cost.

What I did is to take the stride into account as a second component, and for each loop take the maximum stride of all reference groups to get the final stride, which presumably could resolve the motivating problem too.

Apr 18 2022, 1:52 PM · Restricted Project, Restricted Project, Restricted Project