Page MenuHomePhabricator

bmahjour (Bardia Mahjour)
User

Projects

User does not belong to any projects.

User Details

User Since
May 3 2019, 8:38 AM (165 w, 3 d)

Recent Activity

Wed, Jun 29

bmahjour accepted D128653: [PowerPC] Fix the check for scalar MASS conversion.

LGTM. Thanks!

Wed, Jun 29, 2:40 PM · Restricted Project, Restricted Project, Restricted Project, Restricted Project
bmahjour added a comment to D128755: [VPlan] Make sure optimizeInductions removes wide ind from scalar plan..

Address latest comments, thanks!

could you please add the crashing IR from https://reviews.llvm.org/D123720 as an additional test? It still seems to crash with this patch!

Is it still crashing with this patch applied to current main? It needs b18141a8f29f638be0602aa6ffbfc2d434b0b74a as well.

Wed, Jun 29, 9:18 AM · Restricted Project, Restricted Project

Tue, Jun 28

bmahjour added a comment to D128755: [VPlan] Make sure optimizeInductions removes wide ind from scalar plan..

could you please add the crashing IR from https://reviews.llvm.org/D123720 as an additional test? It still seems to crash with this patch!

Tue, Jun 28, 3:39 PM · Restricted Project, Restricted Project
bmahjour added a comment to D123720: [VPlan] Replace use of needsVectorIV with VPlan user check..

The users of WIDEN-INDUCTION in the above recipe are the FIRST-ORDER-RECURRENCE-PHI ir<%prev.011> = phi ir<0>, ir<%indvars15> and EMIT vp<%6> = first-order splice ir<%prev.011> ir<%indvars15>. The first one is a VPFirstOrderRecurrencePHIRecipe and the second one is a VPInstruction, neither of which override usesScalars to return true. It doesn't seem safe to make useScalars return true for these types of recipes, so perhaps we still need to check IV->needsVectorIV()?

Note this is also related to rG85983ca42ec6102b1692d0451090cacd002c958f, so simply reverting the above commit won't fix the case I quoted above.

Thanks, I put up D128755 to fix it by making sure that we replace all widen-inductions with scalar steps for plans with scalar VFs.

Tue, Jun 28, 3:38 PM · Restricted Project, Restricted Project
bmahjour added inline comments to D128755: [VPlan] Make sure optimizeInductions removes wide ind from scalar plan..
Tue, Jun 28, 3:36 PM · Restricted Project, Restricted Project
bmahjour added inline comments to D128653: [PowerPC] Fix the check for scalar MASS conversion.
Tue, Jun 28, 2:46 PM · Restricted Project, Restricted Project, Restricted Project, Restricted Project

Mon, Jun 27

bmahjour added inline comments to D128653: [PowerPC] Fix the check for scalar MASS conversion.
Mon, Jun 27, 2:17 PM · Restricted Project, Restricted Project, Restricted Project, Restricted Project
bmahjour added a comment to D128653: [PowerPC] Fix the check for scalar MASS conversion.

The description should explain why this change is necessary.

Mon, Jun 27, 9:43 AM · Restricted Project, Restricted Project, Restricted Project, Restricted Project

Fri, Jun 24

bmahjour added a comment to D124926: [LoopInterchange] New cost model for loop interchange.

The cost overflow is a known issue (see https://github.com/llvm/llvm-project/issues/55233). I wonder if specifying a larger cache line size (eg. -cache-line-size=128 or 256 ) would workaround the ubsan failures for now. Otherwise we'd have to fix 55233 before recommitting this, I guess.

Fri, Jun 24, 7:45 AM · Restricted Project, Restricted Project, Restricted Project

Thu, Jun 23

bmahjour accepted D124926: [LoopInterchange] New cost model for loop interchange.

LGTM. Thanks!

Thu, Jun 23, 10:20 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour added a comment to D123720: [VPlan] Replace use of needsVectorIV with VPlan user check..

The users of WIDEN-INDUCTION in the above recipe are the FIRST-ORDER-RECURRENCE-PHI ir<%prev.011> = phi ir<0>, ir<%indvars15> and EMIT vp<%6> = first-order splice ir<%prev.011> ir<%indvars15>. The first one is a VPFirstOrderRecurrencePHIRecipe and the second one is a VPInstruction, neither of which override usesScalars to return true. It doesn't seem safe to make useScalars return true for these types of recipes, so perhaps we still need to check IV->needsVectorIV()?

Thu, Jun 23, 9:40 AM · Restricted Project, Restricted Project

Wed, Jun 22

bmahjour added a comment to D123720: [VPlan] Replace use of needsVectorIV with VPlan user check..

Hi @fhahn , this looks like a regression from your commit below:

Wed, Jun 22, 12:17 PM · Restricted Project, Restricted Project

Thu, Jun 16

bmahjour accepted D127342: [TargetTransformInfo] Added an option for the cache line size.
Thu, Jun 16, 11:37 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour added inline comments to D127342: [TargetTransformInfo] Added an option for the cache line size.
Thu, Jun 16, 10:49 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour added a reviewer for D127342: [TargetTransformInfo] Added an option for the cache line size: sfertile.
Thu, Jun 16, 10:48 AM · Restricted Project, Restricted Project, Restricted Project

Fri, Jun 10

bmahjour added inline comments to D127342: [TargetTransformInfo] Added an option for the cache line size.
Fri, Jun 10, 11:41 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour added a comment to D127342: [TargetTransformInfo] Added an option for the cache line size.

This option would also be available in llc, so the following in the description isn't quite accurate:

Fri, Jun 10, 10:44 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour added inline comments to D127342: [TargetTransformInfo] Added an option for the cache line size.
Fri, Jun 10, 10:38 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour added a project to D125847: LTO: Add option to initialize with opaque/non-opaque pointer types: Unknown Object (Project).
Fri, Jun 10, 8:18 AM · Unknown Object (Project), Restricted Project, Restricted Project, Restricted Project
bmahjour added inline comments to D125847: LTO: Add option to initialize with opaque/non-opaque pointer types.
Fri, Jun 10, 8:18 AM · Unknown Object (Project), Restricted Project, Restricted Project, Restricted Project

Thu, Jun 9

bmahjour added a comment to D124926: [LoopInterchange] New cost model for loop interchange.

Thanks for the comment, I see your concern. Would you like me to continue D127342, or would you like me to still keep the cache line size option under LoopCacheCost.cpp as we did in this patch? To avoid breaking computers with no cache, I could update CLS in LoopCacheCost.cpp to the following (I may need to change to wording a little bit) :

CLS = DefaultCacheLineSize.getNumOccurrences() > 0  ? DefaultCacheLineSize : TTI.getCacheLineSize()

and pass -cache-line-size=64 to all loop interchange tests. This way the buildbot test failures can be avoided, and in the meantime for a realworld computer with no cache it won't break.

Thu, Jun 9, 1:08 PM · Restricted Project, Restricted Project, Restricted Project
bmahjour added inline comments to D127342: [TargetTransformInfo] Added an option for the cache line size.
Thu, Jun 9, 9:50 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour added a comment to D124926: [LoopInterchange] New cost model for loop interchange.

Should the setting be directly be applied in TTI::getCacheLineSize

Thu, Jun 9, 9:48 AM · Restricted Project, Restricted Project, Restricted Project

Wed, Jun 8

bmahjour added a comment to D71539: [SCEV] Look through trivial PHIs..

https://reviews.llvm.org/D110973 has landed. This should be safe to re-commit now. Thanks!

Wed, Jun 8, 1:45 PM · Restricted Project, Restricted Project
bmahjour added a comment to D110973: [DA] Handle mismatching loop levels by considering them non-linear.

Thanks for pushing the fix through! I think this should unblock D71539 now.

Wed, Jun 8, 1:40 PM · Restricted Project, Restricted Project
bmahjour added a comment to D124926: [LoopInterchange] New cost model for loop interchange.

Some additional comments:

Wed, Jun 8, 8:43 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour committed rGc9677f6db4b2: [DA] Handle mismatching loop levels by considering them non-linear (authored by bmahjour).
[DA] Handle mismatching loop levels by considering them non-linear
Wed, Jun 8, 8:16 AM · Restricted Project, Restricted Project
bmahjour closed D110973: [DA] Handle mismatching loop levels by considering them non-linear.
Wed, Jun 8, 8:16 AM · Restricted Project, Restricted Project

Mon, Jun 6

bmahjour added a comment to D123408: [InstCombine] Limit folding of cast into PHI.

The cleanest way I can think of to teach LoopVectorizer about this would be to introduce a whole new set of composite reduction operations of the form <op>-then-<lop> (eg RecurKind::AddThenAnd, RecurKind::MulThenAnd, RecurKind::OrThenAnd, and so on)...and that's just for combining logical and with the known integer reduction ops, so if we want to support e.g. or we'd need to double the number of additional recurrence kinds (and the extra logic that comes with it) again. The identity value would be determined from the <op>, and the <lop> has to be applied when reducing the final vector into a single scalar upon loop exit.

@lebedev.ri is this what you had in mind or is there a better way to do it?

@lebedev.ri Can you please advise if the above described way is how we would implement this within the LoopVectorizer?

Sorry, lost track here. I'm not familiar enough with LV to recommend the solution,
but it sounds vaguely reasonable to me. But, do you need the whole <op>-then-<lop> generality?
The only reason why <op>-then-and is useful, is because that and specifies
the effective bitwidth of the reduction, but if the high bits aren't demanded arithmetic/logic ops can be losslessly performed in narrower bit widths:
i32 65535 + i32 65535 = i32 131070 = 0x1FFFE, (trunc(i32 65535) to i8) + (trunc(i32 65535) to i8) = i8 510 = 0xFE, note how low 8 bits are the same.
Perhaps the solution should be around tracking the demanded bit width?

Mon, Jun 6, 10:44 AM · Restricted Project, Restricted Project

Jun 3 2022

bmahjour added a comment to D124926: [LoopInterchange] New cost model for loop interchange.

For the ones that had x86 previously I can reset it to x86, and for the ones that do not have a target triple I can remove it.

Jun 3 2022, 7:58 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour added a comment to D124926: [LoopInterchange] New cost model for loop interchange.

Having all tests depend on a single target is not ideal, because this limits the number of bots/configurations that run the tests. I think we need a way to keep most of the tests independent of the PowerPC or any other target and only have tests that explicitly test the cost-modeling depend on the target.

Jun 3 2022, 7:15 AM · Restricted Project, Restricted Project, Restricted Project

Jun 2 2022

bmahjour added inline comments to D110973: [DA] Handle mismatching loop levels by considering them non-linear.
Jun 2 2022, 1:35 PM · Restricted Project, Restricted Project
bmahjour updated the diff for D110973: [DA] Handle mismatching loop levels by considering them non-linear.

Rebased and added Michael's test case.

Jun 2 2022, 1:34 PM · Restricted Project, Restricted Project
bmahjour added a comment to D109584: [VP] Implementing expansion pass for VP load and store..

Following the discussion in the comments, remove expandPredicationInUnfoldedLoadStore() function (to be added in a follow-up patch)

Jun 2 2022, 7:19 AM · Restricted Project, Restricted Project, Restricted Project

May 31 2022

bmahjour accepted D124926: [LoopInterchange] New cost model for loop interchange.
May 31 2022, 12:47 PM · Restricted Project, Restricted Project, Restricted Project
bmahjour added a comment to D124926: [LoopInterchange] New cost model for loop interchange.

LGTM. Thanks!

May 31 2022, 12:47 PM · Restricted Project, Restricted Project, Restricted Project
bmahjour added a comment to D124926: [LoopInterchange] New cost model for loop interchange.

...With LNICM the load of z is again hoisted into entry, and since entry is the preheader of the original outer loop, after interchange load of z is moved into for.body3.preheader, which is the new outer loop preheader....before this patch, the entry block is not considered part of the loop ...

May 31 2022, 11:06 AM · Restricted Project, Restricted Project, Restricted Project

May 30 2022

bmahjour added inline comments to D124926: [LoopInterchange] New cost model for loop interchange.
May 30 2022, 9:24 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour added inline comments to D124926: [LoopInterchange] New cost model for loop interchange.
May 30 2022, 9:13 AM · Restricted Project, Restricted Project, Restricted Project

May 26 2022

bmahjour added a comment to D124926: [LoopInterchange] New cost model for loop interchange.

Other possible fixes include modifying the gep instructions in Transforms/LICM/lnicm.ll to make mem accesses delinearizable and analyzable. I'd be fine with any possible solution. I'd appreciate it if you could let me know your thoughts? @bmahjour

May 26 2022, 1:53 PM · Restricted Project, Restricted Project, Restricted Project

May 25 2022

bmahjour updated the diff for D110973: [DA] Handle mismatching loop levels by considering them non-linear.
May 25 2022, 2:13 PM · Restricted Project, Restricted Project
bmahjour added a comment to D110973: [DA] Handle mismatching loop levels by considering them non-linear.

The following crashes even with this patch:

May 25 2022, 2:12 PM · Restricted Project, Restricted Project
bmahjour accepted D124984: [NFC][LoopCacheAnalysis] Update test cases to make sure the outputs follow the right order.

LGTM

May 25 2022, 10:09 AM · Restricted Project, Restricted Project, Restricted Project

May 24 2022

bmahjour added a comment to D124984: [NFC][LoopCacheAnalysis] Update test cases to make sure the outputs follow the right order.

Hi Bardia @bmahjour, after some thoughts I think regarding this patch we would chase a longer-term solution. For now I think I could rework and change this patch (both title and content) to an NFC patch that only updates the test cases, i.e., from using CHECK to CHECK-NEXT, since we really want to make sure that the output is ordered and the order is correct. I'm wondering how you think about it?

May 24 2022, 12:56 PM · Restricted Project, Restricted Project, Restricted Project

May 19 2022

bmahjour added a comment to D124926: [LoopInterchange] New cost model for loop interchange.

Other than the inline comments looks good to me.

May 19 2022, 7:45 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour accepted D125810: [LV] Do not LoopSimplify/LCSSA after generating main vector loop..

Ok, LGTM. Thanks!

May 19 2022, 7:02 AM · Restricted Project, Restricted Project

May 17 2022

bmahjour added a comment to D125810: [LV] Do not LoopSimplify/LCSSA after generating main vector loop..

IIRC the reason for calling formLCSSARecursively was to make sure we can handle live outs and reductions properly, which we seem to still be able to do (based on LIT test updates). The call to simplifyLoop was to ensure we don't trip any assumptions about loop simplify form when processing the epilogue loop. It may not be an issue now, but would it be a problem, say if a hypothetical utility function common to both paths wants to assert that the loop is in simplified form? Is keeping simplifyLoop() harmless to VPlan's modeling of exit values?

May 17 2022, 1:43 PM · Restricted Project, Restricted Project

May 16 2022

bmahjour added a comment to D124984: [NFC][LoopCacheAnalysis] Update test cases to make sure the outputs follow the right order.

Although the problem reported in PR55233 is worsened by D123400, the underlying issue existed even before that (as illustrated by the example in PR55233), so this patch won't completely solve the overflow problem, although it would make it a bit less likely to occur. I think a proper solution to PR55233 would need to address the overflow problem itself (eg by increasing the range of values that the cost model can represent).

May 16 2022, 10:02 AM · Restricted Project, Restricted Project, Restricted Project

May 13 2022

bmahjour updated the diff for D110973: [DA] Handle mismatching loop levels by considering them non-linear.

Fix test2 to make sure when D71539 is applied, it fails without this patch and passes with it.

May 13 2022, 2:15 PM · Restricted Project, Restricted Project
bmahjour added a comment to D110973: [DA] Handle mismatching loop levels by considering them non-linear.

I was expecting that we only need to call getLoopDisposition once at the beginning of checkSubscript and then decide depending on whether it is a SCEVAddRecExpr, instead of

if (!AddRec)
  return isLoopInvariant(Expr, LoopNest);

Is this possible?

May 13 2022, 12:42 PM · Restricted Project, Restricted Project

May 5 2022

bmahjour added a project to D75628: [DA] [SCEV] Provide facility to check for total ordering based on dominance: Restricted Project.
May 5 2022, 12:21 PM · Restricted Project, Restricted Project, Restricted Project

May 4 2022

bmahjour added inline comments to D110973: [DA] Handle mismatching loop levels by considering them non-linear.
May 4 2022, 12:06 PM · Restricted Project, Restricted Project

May 3 2022

bmahjour updated the diff for D110973: [DA] Handle mismatching loop levels by considering them non-linear.

Used getLoopDisposition and improved tests.

May 3 2022, 3:19 PM · Restricted Project, Restricted Project
bmahjour abandoned D110972: [DA] Handle mismatching loop levels by updating the numbering scheme.
May 3 2022, 3:16 PM · Restricted Project, Restricted Project
bmahjour added a comment to D110972: [DA] Handle mismatching loop levels by updating the numbering scheme.

@Meinersbur Thanks for the test case that fails without D71539. The test case I reported was bugpoint reduced and that's why the undefs are there. I'll update D110973 with a test that doesn't use undef and that passes today but would fail with D71539. I'll also add your test to it.

May 3 2022, 3:16 PM · Restricted Project, Restricted Project
bmahjour accepted D122776: [NFC][LoopCacheAnalysis] Add a motivating test case for improved loop cache analysis cost calculation.

LGTM

May 3 2022, 9:08 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour added inline comments to D122776: [NFC][LoopCacheAnalysis] Add a motivating test case for improved loop cache analysis cost calculation.
May 3 2022, 7:20 AM · Restricted Project, Restricted Project, Restricted Project

May 2 2022

bmahjour committed rG363b3a645a1e: fix warning caused by ef4ecc3ceffcf3ef129640c813f823c974f9ba22 (authored by bmahjour).
fix warning caused by ef4ecc3ceffcf3ef129640c813f823c974f9ba22
May 2 2022, 2:09 PM · Restricted Project, Restricted Project
bmahjour committed rGef4ecc3ceffc: [LoopCacheAnalysis] Consider dimension depth of the subscript reference when… (authored by bmahjour).
[LoopCacheAnalysis] Consider dimension depth of the subscript reference when…
May 2 2022, 1:51 PM · Restricted Project, Restricted Project
bmahjour closed D123400: [LoopCacheAnalysis] Consider dimension depth of the subscript reference when calculating cost.
May 2 2022, 1:51 PM · Restricted Project, Restricted Project, Restricted Project
bmahjour updated the diff for D123400: [LoopCacheAnalysis] Consider dimension depth of the subscript reference when calculating cost.

Sorry for the delay, as I didn't get a chance to commit this before I went on vacation.

May 2 2022, 11:55 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour added inline comments to D124745: [Delinearization] Refactoring of fixed-size array delinearization.
May 2 2022, 8:40 AM · Restricted Project, Restricted Project, Restricted Project

Apr 20 2022

bmahjour updated the diff for D123400: [LoopCacheAnalysis] Consider dimension depth of the subscript reference when calculating cost.

Added a test for the multi-store case I mentioned above.

Apr 20 2022, 9:39 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour added a comment to D71539: [SCEV] Look through trivial PHIs..

The last known crash has been fixed by now and I plan to re-commit this soon.

AFAIK the only remaining issue is with DependenceAnalysis. @bmahjour do you know if there has been any progress on the fixes you shared a while ago?

Apr 20 2022, 9:20 AM · Restricted Project, Restricted Project
bmahjour abandoned D71492: [SCEV] Generate AddRec for trivial and LCSSA phis outside of loop header (WIP).

This has been superseded by https://reviews.llvm.org/D71539

Apr 20 2022, 9:14 AM · Restricted Project, Restricted Project
bmahjour added a reviewer for D110973: [DA] Handle mismatching loop levels by considering them non-linear: artemrad.
Apr 20 2022, 9:12 AM · Restricted Project, Restricted Project
bmahjour added a reviewer for D110972: [DA] Handle mismatching loop levels by updating the numbering scheme: artemrad.
Apr 20 2022, 9:12 AM · Restricted Project, Restricted Project
bmahjour edited reviewers for D110973: [DA] Handle mismatching loop levels by considering them non-linear, added: fhahn; removed: Florian.
Apr 20 2022, 8:54 AM · Restricted Project, Restricted Project
bmahjour edited reviewers for D110972: [DA] Handle mismatching loop levels by updating the numbering scheme, added: fhahn; removed: Florian.
Apr 20 2022, 8:54 AM · Restricted Project, Restricted Project
bmahjour added reviewers for D110973: [DA] Handle mismatching loop levels by considering them non-linear: Meinersbur, Florian, Restricted Project.
Apr 20 2022, 8:49 AM · Restricted Project, Restricted Project
bmahjour added reviewers for D110972: [DA] Handle mismatching loop levels by updating the numbering scheme: Meinersbur, Florian, Restricted Project.
Apr 20 2022, 8:49 AM · Restricted Project, Restricted Project

Apr 19 2022

bmahjour added a comment to D123400: [LoopCacheAnalysis] Consider dimension depth of the subscript reference when calculating cost.

What I did is to take the stride into account as a second component, and for each loop take the maximum stride of all reference groups to get the final stride, which presumably could resolve the motivating problem too.

Treating stride as a secondary component is what I respectfully objected to, and explained earlier. Not sure if taking the maximum stride would give us what we need. For example consider

for (i)
  for (j)
    for (k)
       ... A[i][j][k]
       ... B[i][k][j]
       ... C[i][k][j]

the maximum stride will be the same for both i-j-k and i-k-j configurations (despite the second one being more profitable) bringing us back to the original problem.

IMHO for this case the cost of loop-k would be higher than loop-j (remember that we compare the cost first and then stride). So loop cache analysis does output the i-k-j pattern.

Apr 19 2022, 1:34 PM · Restricted Project, Restricted Project, Restricted Project

Apr 18 2022

bmahjour added a comment to D123400: [LoopCacheAnalysis] Consider dimension depth of the subscript reference when calculating cost.

What I did is to take the stride into account as a second component, and for each loop take the maximum stride of all reference groups to get the final stride, which presumably could resolve the motivating problem too.

Apr 18 2022, 1:52 PM · Restricted Project, Restricted Project, Restricted Project

Apr 11 2022

bmahjour added a comment to D123400: [LoopCacheAnalysis] Consider dimension depth of the subscript reference when calculating cost.
  1. Boiling down various parameters to one final "cost" value has a disadvantage. In fact we discussed this with @congzhe as the first solution, but we discarded it because essentially you lose some informaiton when you boil everything down to one number. The reason that Congzhe decided to go with a two component cost was that extra detail, carries more useful information and helps to make more accurate decision. Now this was a theoretical concern and I am not sure if in practice it holds or not. But that is the background on congzhe's work.
Apr 11 2022, 2:30 PM · Restricted Project, Restricted Project, Restricted Project
bmahjour added inline comments to D122857: [LoopCacheAnalysis] Enable delinearization of fixed sized arrays.
Apr 11 2022, 2:07 PM · Restricted Project, Restricted Project, Restricted Project
bmahjour added a comment to D122857: [LoopCacheAnalysis] Enable delinearization of fixed sized arrays.

@nikic it may be a good idea to have a discussion about this (impact of opaque pointers on delinearization) at one of the upcoming LoopOptWG calls. If you care to join us, please let me know or add an agenda item to https://docs.google.com/document/d/1sdzoyB11s0ccTZ3fobqctDpgJmRoFcz0sviKxqczs4g/edit so we can discuss this in more depth. Thanks!

Apr 11 2022, 1:40 PM · Restricted Project, Restricted Project, Restricted Project
bmahjour added a comment to D122857: [LoopCacheAnalysis] Enable delinearization of fixed sized arrays.

You mention that the SCEV-based approach cannot handle fixed-sized arrays. Could you please explain in more detail why? It's not really obvious to me.

Apr 11 2022, 1:23 PM · Restricted Project, Restricted Project, Restricted Project
bmahjour added a comment to D123408: [InstCombine] Limit folding of cast into PHI.

The cleanest way I can think of to teach LoopVectorizer about this would be to introduce a whole new set of composite reduction operations of the form <op>-then-<lop> (eg RecurKind::AddThenAnd, RecurKind::MulThenAnd, RecurKind::OrThenAnd, and so on)...and that's just for combining logical and with the known integer reduction ops, so if we want to support e.g. or we'd need to double the number of additional recurrence kinds (and the extra logic that comes with it) again. The identity value would be determined from the <op>, and the <lop> has to be applied when reducing the final vector into a single scalar upon loop exit.

Apr 11 2022, 12:13 PM · Restricted Project, Restricted Project

Apr 8 2022

bmahjour updated subscribers of D122776: [NFC][LoopCacheAnalysis] Add a motivating test case for improved loop cache analysis cost calculation.

Thanks for looking into this @congzhe . As you described we don't seem to be able to distinguish between the cost of moving outer loops in nests more than 2 levels deep. The data locality paper tries to estimate the cost based on the estimated number of cache lines used when moving a loop into innermost level. Since the stride information (when it's less than the cache line size) is already used in RefCost functions, it makes more sense to somehow integrate it into the cost formula when it's larger than the cache line size. Treating stride as a separate component of the cost and associating it with the loop (as opposed to each reference group) makes the result of the analysis more complicated to consume and can cause ambiguity when dealing with multiple reference groups.

Apr 8 2022, 9:34 AM · Restricted Project, Restricted Project, Restricted Project
bmahjour requested review of D123400: [LoopCacheAnalysis] Consider dimension depth of the subscript reference when calculating cost.
Apr 8 2022, 9:18 AM · Restricted Project, Restricted Project, Restricted Project

Apr 1 2022

bmahjour accepted D120343: [docs] Add Loop Opt WG meeting ics..

LGTM. Thanks.

Apr 1 2022, 7:11 AM · Restricted Project, Restricted Project
bmahjour added a reviewer for D120343: [docs] Add Loop Opt WG meeting ics.: bmahjour.
Apr 1 2022, 7:11 AM · Restricted Project, Restricted Project

Mar 9 2022

bmahjour accepted D118102: [LoopInterchange] Detect output dependency of a store instruction with itself.

With the recent updates we are no longer losing test coverage. Per discussion in the Loop Opt WG call, I'll approve to get this in as a small incremental improvement. I'll look into https://github.com/llvm/llvm-project/issues/54176 and follow up with more patches.

Mar 9 2022, 9:04 AM · Restricted Project, Restricted Project, Restricted Project

Mar 7 2022

bmahjour added inline comments to D118102: [LoopInterchange] Detect output dependency of a store instruction with itself.
Mar 7 2022, 12:29 PM · Restricted Project, Restricted Project, Restricted Project

Mar 3 2022

bmahjour added inline comments to D118102: [LoopInterchange] Detect output dependency of a store instruction with itself.
Mar 3 2022, 6:45 AM · Restricted Project, Restricted Project, Restricted Project

Mar 2 2022

bmahjour requested changes to D118102: [LoopInterchange] Detect output dependency of a store instruction with itself.
Mar 2 2022, 6:56 AM · Restricted Project, Restricted Project, Restricted Project

Feb 25 2022

bmahjour added a comment to D119261: [DependenceAnalysis][PR52170] Conservative crash on overflowed loop backedge.

I'm generally in favour of this patch, but just not sure if there is (or should be) a guarantee that SCEV will NOT produce an AddRec in overflow situations. For example when a GEP has the inbounds attribute, the produced address after adding the index/offset cannot cause an unsigned overflow, so not sure why SCEV would produce different results depending on whether the index/offset could wrap! SCEV seems to produce an AddRec for the offset/index (eg. %i.add) regardless whether it could wrap or not. It's just the GEP that seems to be behaving the way you described.

Feb 25 2022, 9:45 AM · Restricted Project, Restricted Project
bmahjour added a comment to D109584: [VP] Implementing expansion pass for VP load and store..

hello @hussainjk @bmahjour, are you still working on this? Otherwise, I may take it over on your behalf.

Feb 25 2022, 9:14 AM · Restricted Project, Restricted Project, Restricted Project

Feb 23 2022

bmahjour added a comment to D120343: [docs] Add Loop Opt WG meeting ics..

Thanks for this @Meinersbur . Could we replace "https://ibm.webex.com/ibm/j.php?MTID=mad5dd5ca19e0423759b0c507245e4ccc" everywhere it occurs with "https://ibm.webex.com/ibm/j.php?MTID=m450e0c4009445e16df43ff82ea58f7a6", so that the meeting link is consistent with the info in the google doc.

Feb 23 2022, 12:55 PM · Restricted Project, Restricted Project

Feb 8 2022

bmahjour added a comment to D114779: [LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()`.

Even though the initial intention of the "hack" might have been to prevent predication on some X86 subtargets, it was added in a way that affected all targets, so it could expose cost model issues on various targets if suddenly removed. I wasn't aware of this and just noticed it....has any performance runs been done on other targets (eg. Power)?

Feb 8 2022, 2:44 PM · Restricted Project
bmahjour added inline comments to D119261: [DependenceAnalysis][PR52170] Conservative crash on overflowed loop backedge.
Feb 8 2022, 10:31 AM · Restricted Project, Restricted Project

Feb 4 2022

bmahjour accepted D119035: [PowerPC] Option controling scalar MASS convertion.
Feb 4 2022, 1:04 PM · Restricted Project

Feb 1 2022

bmahjour accepted D101759: [PowerPC] Scalar IBM MASS library conversion pass.

Apart from some minor inline comments this revision addresses all my outstanding comments. LGTM.

Feb 1 2022, 11:21 AM · Restricted Project, Restricted Project, Restricted Project

Jan 31 2022

bmahjour added a comment to D118102: [LoopInterchange] Detect output dependency of a store instruction with itself.

Couldn't the same problem happen in theory without control flow divergence? For example consider a loop like this:

Jan 31 2022, 12:30 PM · Restricted Project, Restricted Project, Restricted Project

Jan 27 2022

bmahjour added inline comments to D101759: [PowerPC] Scalar IBM MASS library conversion pass.
Jan 27 2022, 2:04 PM · Restricted Project, Restricted Project, Restricted Project

Jan 17 2022

bmahjour added a comment to D116479: [VPlan] Introduce and use BranchOnCount VPInstruction..

This commit somehow causes the "unroll.disable" metadata to be dropped from the vector epilogue loops. For example:

Thanks for the heads-up. Looks like unfortunately the relevant check lines to guard against the regression have been removed earlier. It should be fixed by 070d1034da87. I also tried to restore the original check lines removed by 278aa65cc495

Jan 17 2022, 11:09 AM · Restricted Project

Jan 14 2022

bmahjour added inline comments to D93793: [IR] Let IRBuilder's CreateVectorSplat/CreateShuffleVector use poison as placeholder.
Jan 14 2022, 11:52 AM · Restricted Project, Restricted Project
bmahjour added a comment to D116479: [VPlan] Introduce and use BranchOnCount VPInstruction..

This commit somehow causes the "unroll.disable" metadata to be dropped from the vector epilogue loops. For example:

Jan 14 2022, 11:18 AM · Restricted Project
bmahjour added inline comments to D93793: [IR] Let IRBuilder's CreateVectorSplat/CreateShuffleVector use poison as placeholder.
Jan 14 2022, 11:03 AM · Restricted Project, Restricted Project
bmahjour added inline comments to D116928: [LoopVectorize] Support epilogue vectorisation of loops with reductions.
Jan 14 2022, 10:35 AM · Restricted Project