Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Phabricator shutdown timeline

simoll (Simon Moll)
User

Projects

User does not belong to any projects.

User Details

User Since
Jan 23 2017, 12:28 AM (347 w, 6 d)

Recent Activity

May 12 2023

simoll added a comment to D137142: [WIP] DivergenceAnalysis: Infer uniformity from assume calls.

For the DA in isolation, ideally, we'd have something like:

%Y = llvm.assume.uniform(%X)
foo(%Y) ; <- Rewritten to use %Y instead of %X.

This is roughly what the target-specific @llvm.amdgcn.readfirstlane does today, and some frontends do use it to assert and/or enforce uniformity of particular values. There is some sublety about exactly what it means (or exactly what @llvm.assume.uniform should mean): Read the first active lane? Read an arbitrary active lane? Undefined/poison if active lanes do not all have the same value?

Read all active lanes. The intrinsic only tells us that we can assume uniformity among the active lanes in each instance, it could not be used to enforce it. Not so sure about the values on inactive lanes, I'd say it simply passes through the incoming values.. you may just want poison here though..

We do need to say what happens if the assumptions is wrong. I believe at a minimum we need to say that the result is poison, because of what happens when the result feeds into a conditional branch: divergence analysis uses the assumption, which can affect codegen. So UB on that branch if the assumption is wrong seems like the minimum we need.

Though immediate UB is a legitimate alternative, since it would allow us to replace other uses of %X by %Y.

May 12 2023, 3:25 AM · Restricted Project, Restricted Project

May 10 2023

simoll added a comment to D142351: [VP][DAGCombiner][NFC] Precommit test for combine for VP_FMA..

Can we abandon this now that D141891 has landed?

May 10 2023, 5:55 AM · Restricted Project, Restricted Project

May 9 2023

simoll added a comment to D149916: [VP][SelectionDAG][RISCV] Add get_vector_length intrinsics and generic SelectionDAG support..

I suppose this makes sense for ARM MVE/SVE as well?
Generally, we should simplify this in IR for other targets and/or tripcounts with known multiples: otherwise the VP expansion pass will create %evl-to-%mask expansion code not knowing that the %evl is ineffective (eg %evl == vector_length).

May 9 2023, 8:19 AM · Restricted Project, Restricted Project
simoll added a comment to D149855: [DAGCombiner] Avoid template for generalized pattern match..

There is a concern that using template for many functions in
DAGCombiner.cpp may make the binary too large.

Could you provide more details on this concern, eg where was it raised? To put this discussion in perspective, looking at vanilla LLVM 14, a debug build of DAGCombiner.cpp.o has 7MB vs 1322MB for libLLVM-14.so (with X86,RISCV). I would add that this approach adds virtual dispatch on the standard DAGCombiner path, which has a performance cost. IMHO, the question is what a good tradeoff would be.

I raised it an internal SiFive discussion. I agree a virtual dispatch and a heap allocation are not an improvement. We haven't applied this generic matcher to much code yet and if we started applying it aggressively we may double the size of DAGCombiner.cpp.o. So I thought it was worth thinking about alternative abstractions that didn't duplicate entire functions. I just don't have any good ideas yet.

You don't need to rewrite the templated code to have the option for virtual dispatch in the future:
If the need arises, we could implement one SuperMatchContext that defers to EmptyContext and VPMatchContext internally (via subclasses or otherwise) and only instantiate the templates for SuperMatchContext once.
(If you go down this route, there is a bunch of cool things you can do, eg combining different matchers or running matcher code with multiple contexts in parallel).

May 9 2023, 7:23 AM · Restricted Project, Restricted Project

May 8 2023

simoll added a comment to D149855: [DAGCombiner] Avoid template for generalized pattern match..

There is a concern that using template for many functions in
DAGCombiner.cpp may make the binary too large.

Could you provide more details on this concern, eg where was it raised? To put this discussion in perspective, looking at vanilla LLVM 14, a debug build of DAGCombiner.cpp.o has 7MB vs 1322MB for libLLVM-14.so (with X86,RISCV). I would add that this approach adds virtual dispatch on the standard DAGCombiner path, which has a performance cost. IMHO, the question is what a good tradeoff would be.

May 8 2023, 1:16 AM · Restricted Project, Restricted Project

Apr 24 2023

simoll accepted D149052: [VP] IR expansion for fabs/fsqrt/fma/fmadd.
Apr 24 2023, 4:53 AM · Restricted Project, Restricted Project

Feb 13 2023

simoll added a comment to D143441: [clang][VE] Change to use only use major version in resource dir.

LGTM. I am no longer the maintainer of the VE bot, though.

Feb 13 2023, 1:25 PM · Restricted Project

Jan 13 2023

simoll added a comment to D137142: [WIP] DivergenceAnalysis: Infer uniformity from assume calls.

For the DA in isolation, ideally, we'd have something like:

%Y = llvm.assume.uniform(%X)
foo(%Y) ; <- Rewritten to use %Y instead of %X.

This is roughly what the target-specific @llvm.amdgcn.readfirstlane does today, and some frontends do use it to assert and/or enforce uniformity of particular values. There is some sublety about exactly what it means (or exactly what @llvm.assume.uniform should mean): Read the first active lane? Read an arbitrary active lane? Undefined/poison if active lanes do not all have the same value?

Jan 13 2023, 4:20 AM · Restricted Project, Restricted Project

Jan 12 2023

simoll added a comment to D137142: [WIP] DivergenceAnalysis: Infer uniformity from assume calls.

I don't quite see the point of this change. For test cases like @assume_ballot_eq_0, what we really should be doing here is optimize the branch away entirely because the llvm.assume implies that %cmp == 0.

It sounds like what we really want here is a sort of llvm.assume.uniform intrinsic. Or maybe an llvm.amdgcn.is.uniform intrinsic and then do llvm.assume(llvm.amdgcn.is.uniform)

Jan 12 2023, 2:30 AM · Restricted Project, Restricted Project

Sep 26 2022

simoll abandoned D50433: A New Divergence Analysis for LLVM.

The extension to irreducible control happens in D130746

Sep 26 2022, 12:42 AM · Restricted Project, Restricted Project
simoll accepted D134474: [VP][RISCV] Add vp.fmuladd..
Sep 26 2022, 12:40 AM · Restricted Project, Restricted Project

Sep 19 2022

simoll added a comment to D133967: [IRBuilder] Use PoisonValue in CreateMasked*.

I reviewed all users:

  • SLP Vectorizer: the mask is all ones, so passthru is irrelevant
  • Loop vectorizer: masked out lanes aren't used. In fact, masked loads are already using poison for passthru.
  • X86 expand intrinsic: uses custom zero passthru.
  • Codegen expand vp load/gather: there's an issue here as langref says passthru is undef.

So, vector folks, is it ok to change @llvm.vp.load/gather's passthru to poison instead of undef?

Yes and we should do this for all VP intrinsics. This also means we need to change the wording in the Semantics sections, eg https://llvm.org/docs/LangRef.html#id894 .

Sep 19 2022, 1:15 AM · Restricted Project, Restricted Project, Restricted Project

Aug 30 2022

simoll accepted D132972: [VP] Correct the LEGALPOS for VP_STORE..
Aug 30 2022, 11:46 PM · Restricted Project, Restricted Project

Aug 19 2022

simoll accepted D132210: [LangRef][VP] Fix typo..
Aug 19 2022, 1:15 AM · Restricted Project, Restricted Project

Aug 4 2022

simoll committed rG74940d266898: [VP] Add widening for VP_STRIDED_LOAD and VP_STRIDED_STORE (authored by loralb).
[VP] Add widening for VP_STRIDED_LOAD and VP_STRIDED_STORE
Aug 4 2022, 7:15 AM · Restricted Project, Restricted Project
simoll closed D121114: [VP] Add widening for VP_STRIDED_LOAD and VP_STRIDED_STORE.
Aug 4 2022, 7:14 AM · Restricted Project, Restricted Project

Jul 20 2022

simoll added a comment to D121114: [VP] Add widening for VP_STRIDED_LOAD and VP_STRIDED_STORE.

Please rebase for commit

Jul 20 2022, 1:25 AM · Restricted Project, Restricted Project
simoll committed rG07d69d9fc904: [VP] Legalize the stride operand for EXPERIMENTAL_VP_STRIDED SDNodes (authored by loralb).
[VP] Legalize the stride operand for EXPERIMENTAL_VP_STRIDED SDNodes
Jul 20 2022, 1:24 AM · Restricted Project, Restricted Project
simoll closed D123112: [VP] Legalize the stride operand for EXPERIMENTAL_VP_STRIDED SDNodes.
Jul 20 2022, 1:23 AM · Restricted Project, Restricted Project

Jul 19 2022

simoll added a comment to D123112: [VP] Legalize the stride operand for EXPERIMENTAL_VP_STRIDED SDNodes.

I could not apply this patch. Please rebase again so i can commit.

Jul 19 2022, 2:57 AM · Restricted Project, Restricted Project

Jul 18 2022

simoll committed rGc00a44fa6839: [VP] IR expansion pass for VP gather and scatter (authored by loralb).
[VP] IR expansion pass for VP gather and scatter
Jul 18 2022, 8:01 AM · Restricted Project, Restricted Project
simoll closed D120664: [VP] IR expansion pass for VP gather and scatter.
Jul 18 2022, 8:01 AM · Restricted Project, Restricted Project

Jul 17 2022

simoll committed rGf390781cec5c: [VP] Implementing expansion pass for VP load and store. (authored by loralb).
[VP] Implementing expansion pass for VP load and store.
Jul 17 2022, 11:48 PM · Restricted Project, Restricted Project
simoll closed D109584: [VP] Implementing expansion pass for VP load and store..
Jul 17 2022, 11:48 PM · Restricted Project, Restricted Project, Restricted Project

Jul 14 2022

simoll updated the diff for D92086: Generalized PatternMatch & InstSimplify.

Rebase onto committed add_vp.ll test to better show improvement

Jul 14 2022, 4:31 AM · Restricted Project, Restricted Project
simoll committed rG173d4b84f614: [VP] Add test to show optimization opportunities (authored by simoll).
[VP] Add test to show optimization opportunities
Jul 14 2022, 3:37 AM · Restricted Project, Restricted Project
simoll closed D129746: [VP] Add test to show optimization opportunities.
Jul 14 2022, 3:37 AM · Restricted Project, Restricted Project, Restricted Project
simoll added inline comments to D129746: [VP] Add test to show optimization opportunities.
Jul 14 2022, 3:02 AM · Restricted Project, Restricted Project, Restricted Project
simoll requested review of D129746: [VP] Add test to show optimization opportunities.
Jul 14 2022, 3:00 AM · Restricted Project, Restricted Project, Restricted Project

Jun 27 2022

simoll added a comment to D92086: Generalized PatternMatch & InstSimplify.

We can have a cmake variable that controls instantation and if your distribution does not care about constrained fp or vp, say, you just don't instantiate for it and won't see compile time or size increases. I was hinting in that direction with the EnabledTraits.def file.

I don't think this makes a lot sense. There's no way we can disable this from the distribution side if constrained FP and VP are core parts of LLVM (which at least the former is at this point, given that it is exposed by Clang). If you want to do this you'll also have to export variables for llvm-lit, so tests can be disabled based on which traits are enabled. I don't think we want to go there.

I'd really like to have a working patch (with enabled traits) so I can give this a basic compile-time evaluation at least.

Well, What isn't working on your system? All templates should be instantiated now. Please share your build configuration because static/dylib/shared lib builds of LLVM work fine on my system.
The cmake variable (in the reference patch), let's you run compile time tests with different trait configurations: cmake -DLLVM_OPTIMIZER_TRAITS_TO_INSTANTIATE=all. Whether we actually want a cmake variable is a different story. However, there is precedent in the TARGETS_TO_BUILD cmake variable: Distributions can configure the enabled backends. Yet, there are target-specific intrinsics in every LLVM distribution. I wouldn't be surprised if some distributors removed those manually to make their builds smaller.

Jun 27 2022, 1:47 AM · Restricted Project, Restricted Project

Jun 24 2022

simoll updated the diff for D92086: Generalized PatternMatch & InstSimplify.
  • Fixed m_ExtractValue pattern (all tests passing now).
  • Add cmake variable to control which traits InstSimplify will be instantiated for (LLVM_OPTIMIZER_TRAITS_TO_INSTANTIATE .. eg VPTrait;CFPTrait).
  • Explicitly instantiate InstSimplify templates.
Jun 24 2022, 11:09 AM · Restricted Project, Restricted Project

Jun 22 2022

simoll updated the diff for D92086: Generalized PatternMatch & InstSimplify.
  • inline -> static inline
  • rebased
Jun 22 2022, 7:54 AM · Restricted Project, Restricted Project

Jun 10 2022

simoll added a comment to D92086: Generalized PatternMatch & InstSimplify.

This is really on two separate things: First on generalized pattern matching, the second on vector predication (which is only one of the intrinsics sets benefiting from this).

Jun 10 2022, 4:25 AM · Restricted Project, Restricted Project
simoll added inline comments to D92086: Generalized PatternMatch & InstSimplify.
Jun 10 2022, 2:08 AM · Restricted Project, Restricted Project
simoll added a comment to D92086: Generalized PatternMatch & InstSimplify.

Some comments/thoughts inline.
In the meantime, upstream got new fp patterns that consider the fp environment. This should work just fine with this approach (see my comment on the consider function - we may even DCE fp-environment-aware-patterns for Traits that support fp but do not care about the fp env).

Jun 10 2022, 2:05 AM · Restricted Project, Restricted Project

Jun 9 2022

simoll updated the diff for D92086: Generalized PatternMatch & InstSimplify.

Here is the rebased patch for now. Currently two failing tests. This could be instsimplify opportunities enabled by this patch on constrained fp (have to look into it more).

Jun 9 2022, 9:16 AM · Restricted Project, Restricted Project
simoll committed rG746908a0380c: [NFC] Clang-format PatternMatch.h (authored by simoll).
[NFC] Clang-format PatternMatch.h
Jun 9 2022, 7:55 AM · Restricted Project, Restricted Project
simoll abandoned D126783: [NFC] clang-format InstructionSimplify.cpp.

D126889 has landed, which includes clang-formatting.

Jun 9 2022, 7:50 AM · Restricted Project, Restricted Project
simoll committed rGb8c2781ff601: [NFC] format InstructionSimplify & lowerCaseFunctionNames (authored by simoll).
[NFC] format InstructionSimplify & lowerCaseFunctionNames
Jun 9 2022, 7:10 AM · Restricted Project, Restricted Project
simoll closed D126889: [NFC] format InstructionSimplify & lowerCaseFunctionNames.
Jun 9 2022, 7:10 AM · Restricted Project, Restricted Project
simoll added a comment to D126889: [NFC] format InstructionSimplify & lowerCaseFunctionNames.

We got one accept and no opposition. This passes all testing locally. If pre-merge checks are ok (except for unrelated breakage), I will go forward and commit this.

Jun 9 2022, 6:29 AM · Restricted Project, Restricted Project
simoll updated the diff for D126889: [NFC] format InstructionSimplify & lowerCaseFunctionNames.
  • lower case more static functions.
  • rebased.
Jun 9 2022, 6:24 AM · Restricted Project, Restricted Project

Jun 8 2022

simoll added a comment to D92086: Generalized PatternMatch & InstSimplify.

It looks like unfortunately the patch doesn't apply on current main. Does it have any dependencies or does it just need a rebase?

Jun 8 2022, 12:05 AM · Restricted Project, Restricted Project

Jun 7 2022

simoll added a comment to D126889: [NFC] format InstructionSimplify & lowerCaseFunctionNames.

No objection from me, and I do maintain downstream forks. Can you provide a simple one-line command that I can run to make the equivalent changes in my downstream?

Apart from applying the patch, not really. This is clang-format + lower-case all function names in InstCombine.cpp/h + compile, wait for an error and lower-case the highlighted function name in the error message, repeat.

Jun 7 2022, 3:22 AM · Restricted Project, Restricted Project
simoll updated the diff for D126889: [NFC] format InstructionSimplify & lowerCaseFunctionNames.

Cleanup, repair function names, comments, variables

Jun 7 2022, 3:17 AM · Restricted Project, Restricted Project

Jun 2 2022

simoll updated the diff for D126889: [NFC] format InstructionSimplify & lowerCaseFunctionNames.

Also use the new function names in the unit tests..

Jun 2 2022, 9:02 AM · Restricted Project, Restricted Project
simoll added a reviewer for D126889: [NFC] format InstructionSimplify & lowerCaseFunctionNames: rengolin.
Jun 2 2022, 8:57 AM · Restricted Project, Restricted Project
simoll added a comment to D92086: Generalized PatternMatch & InstSimplify.

Btw, no matter where you stand on D126889 - the patch lower-cases all InstructionSimplify function names - the diff shows just how many different passes actually rely on instruction simplification. Those passes could potentially benefit from generalized pattern matching (even if its limited to instsimplify/combine).

Jun 2 2022, 8:54 AM · Restricted Project, Restricted Project
simoll added a comment to D126783: [NFC] clang-format InstructionSimplify.cpp.

LGTM - I'd go further and update the old "FunctionName" formatting too, so we are consistent with the current "functionName" style.

Many of the changed lines would already overlap, and if it's done in this patch, then that's still just this one potential commit that anyone would be looking past with 'blame'.

Jun 2 2022, 8:44 AM · Restricted Project, Restricted Project
simoll requested review of D126889: [NFC] format InstructionSimplify & lowerCaseFunctionNames.
Jun 2 2022, 8:43 AM · Restricted Project, Restricted Project
simoll accepted D109584: [VP] Implementing expansion pass for VP load and store..

LGTM

Jun 2 2022, 2:24 AM · Restricted Project, Restricted Project, Restricted Project
simoll accepted D120664: [VP] IR expansion pass for VP gather and scatter.

LGTM

Jun 2 2022, 2:20 AM · Restricted Project, Restricted Project
simoll accepted D126847: [LegalizeTypes][VP] Add widen and split support for VP FP integer casting op..

LGTM
we should rework those messy switches in isel legalization someday.. but this isn't a blocker for this patch

Jun 2 2022, 1:38 AM · Restricted Project, Restricted Project
simoll added reviewers for D126783: [NFC] clang-format InstructionSimplify.cpp: reames, nikic, fhahn, spatel, hkmatsumoto.

Adding the top five people blamed for changed lines in 2022 as reviewers.. Are you okay with auto-formatting the whole file with one commit?

Jun 2 2022, 1:13 AM · Restricted Project, Restricted Project

Jun 1 2022

simoll added a comment to D92086: Generalized PatternMatch & InstSimplify.

First, I think this is a good idea and can eventually mitigate the general problem of intrinsics vs. instructions in other LLVM passes.

Thanks for chiming in!

Jun 1 2022, 9:04 AM · Restricted Project, Restricted Project
simoll added a comment to D126783: [NFC] clang-format InstructionSimplify.cpp.

Making this a Diff mostly because it changes (although non-functionally) such a huge file.

Jun 1 2022, 8:05 AM · Restricted Project, Restricted Project
simoll requested review of D126783: [NFC] clang-format InstructionSimplify.cpp.
Jun 1 2022, 8:03 AM · Restricted Project, Restricted Project

May 30 2022

simoll added a reverting change for rG2e2a8a2d9082: Revert "[VP] vp intrinsics are not speculatable": rG18c1ee04de44: Re-land "[VP] vp intrinsics are not speculatable" with test fix.
May 30 2022, 5:42 AM · Restricted Project, Restricted Project
simoll committed rG18c1ee04de44: Re-land "[VP] vp intrinsics are not speculatable" with test fix (authored by simoll).
Re-land "[VP] vp intrinsics are not speculatable" with test fix
May 30 2022, 5:42 AM · Restricted Project, Restricted Project, Restricted Project
simoll committed rG78a18d2b54e7: [VP] vp intrinsics are not speculatable (authored by simoll).
[VP] vp intrinsics are not speculatable
May 30 2022, 3:20 AM · Restricted Project, Restricted Project
simoll closed D125296: [VP] vp intrinsics are not speculatable.
May 30 2022, 3:20 AM · Restricted Project, Restricted Project, Restricted Project
simoll added a comment to D126625: [VPlan, VP] 1/4 Introduce new recipes to support predicated vectorization.

Are you happy with me merging this back into D126363? Do you want to keep working on the VPlan-VP integration or what is your plan here?

yeah! I am very happy to continue working on vplan VP integration, and improving LLVM's vectorization to enable automatic vectorization for RVV.

! In D126625#3545290, @ym1813382441 wrote:

! In D126625#3545254, @simoll wrote:

Are you happy with me merging this back into D126363? Do you want to keep working on the VPlan-VP integration or what is your plan here?

And my English may not be very good. Please don't laught at me if you find something indecent of me :)

All good :) We will find a way to work together on the patches. I will send you an e-mail to discuss this.

May 30 2022, 3:18 AM · Restricted Project, Restricted Project, Restricted Project
simoll added a comment to D125296: [VP] vp intrinsics are not speculatable.

Ping. Can we land this patch?

May 30 2022, 2:38 AM · Restricted Project, Restricted Project, Restricted Project
simoll added a comment to D126625: [VPlan, VP] 1/4 Introduce new recipes to support predicated vectorization.

Are you happy with me merging this back into D126363? Do you want to keep working on the VPlan-VP integration or what is your plan here?

May 30 2022, 2:10 AM · Restricted Project, Restricted Project, Restricted Project

May 27 2022

simoll added a comment to D126363: [VPlan, VP] 1/4 Introduce new recipes to support predicated vectorization.

We store the the %evl as the related value for VPValue *EVL using State.set(EVL, %evl, Part) and then get required value using State.get(EVL, Part).
In this case we can treat EVL similarly to canonical iv, which is not an invariant.

Ok. If that works then having one global EVL per State defined this way should be fine for us for now.

The way VectorTripCount is handled at the moment is a workaround and probably shouldn't act as inspiration. AFAICT we already removed all uses during code-gen that where using State to access the vector trip count. If it is needed for code-gen of a recipe, it should be expressed as operand.

Better to treat EVL as CanonicalIV. Yes, I agree that the recipe is better choice here (similar to CaonicalIV) but it requires some extra work because of VPWidenIntOrFpInductionRecipe should depend on EVL. Probably, need to split VPWidenIntOrFpInductionRecipe to a PHI recipe and something like CanonicalIVIncrement, otherwise this dependency prevents it from being vectorized effectively.

May 27 2022, 7:51 AM · Restricted Project, Restricted Project, Restricted Project
simoll added a comment to D126363: [VPlan, VP] 1/4 Introduce new recipes to support predicated vectorization.

We need to generate code to compute the EVL, its operand should be application vector length ,and EVL maybe depends on the canonical induction recipes.
For example,

for (i = 0; i < n; ++i)
  c[i] = a[i] + b[i];

VPlan should be modeled as:

n = trip count
 vector loop: {
  EMIT i = canonical induction
  EMIT avl = n - i
  EMIT evl = set.evl(avl, ...)  (generate explicit vector length)
  EMIT mask = (all true mask)
  WIDEN t0 = vp.load(a, mask, evl)
  WIDEN t1 = vp.load(b, mask, evl)
  WIDEN t2 = vp.add(t0, t1, mask, evl)
  WIDEN vp.store(t2, c, mask, evl)
  EMIT  i.next = i + evl
  EMIT  i < n    (branch-on-count)
}

EVL may be different in each vector iteration, and set.evl may also require more parameters, such as the upper limit of the vector length or the width of the data type.
Various situations show that we need to model it as recipe.
And "llvm.set.evl" intrinsic need to be added.
We may refer to RVV architecture to set some restrictions for "llvm.set.evl", like...

evl = 0 if avl = 0
evl > 0 if avl > 0
evl ≤ VF
evl ≤ avl

Thanks for the example! We seem to converge on the VPWidenEVL recipe among all active participants in the discussion. To make a concrete proposal:

evl = ExplicitVectorLength(TripCount, CanonicalInduction)

That's just pulled and rephrased from your example. This should be sufficient for RVV, right?

Shall we rename it to something like VPCanonicalEVL... instead? It is scalar.

May 27 2022, 5:43 AM · Restricted Project, Restricted Project, Restricted Project
simoll added a comment to D126363: [VPlan, VP] 1/4 Introduce new recipes to support predicated vectorization.

We need to generate code to compute the EVL, its operand should be application vector length ,and EVL maybe depends on the canonical induction recipes.
For example,

for (i = 0; i < n; ++i)
  c[i] = a[i] + b[i];

VPlan should be modeled as:

n = trip count
 vector loop: {
  EMIT i = canonical induction
  EMIT avl = n - i
  EMIT evl = set.evl(avl, ...)  (generate explicit vector length)
  EMIT mask = (all true mask)
  WIDEN t0 = vp.load(a, mask, evl)
  WIDEN t1 = vp.load(b, mask, evl)
  WIDEN t2 = vp.add(t0, t1, mask, evl)
  WIDEN vp.store(t2, c, mask, evl)
  EMIT  i.next = i + evl
  EMIT  i < n    (branch-on-count)
}

EVL may be different in each vector iteration, and set.evl may also require more parameters, such as the upper limit of the vector length or the width of the data type.
Various situations show that we need to model it as recipe.
And "llvm.set.evl" intrinsic need to be added.
We may refer to RVV architecture to set some restrictions for "llvm.set.evl", like...

evl = 0 if avl = 0
evl > 0 if avl > 0
evl ≤ VF
evl ≤ avl
May 27 2022, 4:53 AM · Restricted Project, Restricted Project, Restricted Project
simoll added a comment to D126363: [VPlan, VP] 1/4 Introduce new recipes to support predicated vectorization.

We store the the %evl as the related value for VPValue *EVL using State.set(EVL, %evl, Part) and then get required value using State.get(EVL, Part).
In this case we can treat EVL similarly to canonical iv, which is not an invariant.

Ok. If that works then having one global EVL per State defined this way should be fine for us for now.

The way VectorTripCount is handled at the moment is a workaround and probably shouldn't act as inspiration. AFAICT we already removed all uses during code-gen that where using State to access the vector trip count. If it is needed for code-gen of a recipe, it should be expressed as operand.

Better to treat EVL as CanonicalIV. Yes, I agree that the recipe is better choice here (similar to CaonicalIV) but it requires some extra work because of VPWidenIntOrFpInductionRecipe should depend on EVL. Probably, need to split VPWidenIntOrFpInductionRecipe to a PHI recipe and something like CanonicalIVIncrement, otherwise this dependency prevents it from being vectorized effectively.

If we need to generate code to compute the EVL, it should be modeled as recipe. If the EVL depends on any other recipes (like the canonical induction), it needs to be a recipe. If all that is needed is an opcode and operands, then it should probably just be a additional opcode in VPInstruction, instead of a new recipe.

Here is my suggestion:

  1. We get an explicit-vector-length recipe to compute EVL inside the vector loop. And this will be the only recipe we add because..
  2. We extend the existing recipes with an (optional) EVL operand. Presence of EVL implies that VP intrinsics are used for widening.

I'm afraid that it will require HUGE(!!!) amount of changes in the Vplan. I assume, there still will be the same recipe/vpvalue for EVL across of all recipes/vpinstructions.

How? I think there is some kind of misunderstanding here.
There is an existing prototype implementation that uses these three new recipes and a global EVL per vector loop.
The code for the additional recipes is small and - frankly - trivial when you know how to do it. If you use separate recipes, the existing recipes are completely unaffected by this.

What part of my suggestion makes you think that there would be huge changes? Is it the adding EVL to existing recipes?

I think when deciding whether to add new recipes here, a key question is what the differences are to other existing recipes. IIUC the only difference between VPPredicatedWidenRecipe and VPWidenRecipe modulo the extra mask & EVL operands is during code-gen, right? But fundamentally both still widen an (arithmetic) operation. Whether some elements may be masked out shouldn't really matter for VPlan based analysis at the moment.

I suppose? These are the differences between VPPredicatedWidenRecipe and VPWidenRecipe, i see:

  • Mask & EVL operands. The Mask operand could be optional (which implies a constant all-one mask), EVL is mandatory.
  • EVL has significant performance implications on RVV and VE. If you count costing as a VPlan based analysis, then that would be one analysis in favor for having two recipe kinds.
  • VPPredicatedWiden* recipes always widen to VP intrinsics. VPWiden* don't.
May 27 2022, 4:47 AM · Restricted Project, Restricted Project, Restricted Project
simoll accepted D126457: [NFC][VP] Fix llvm.vp.merge intrinsic Expansion in LangRef.
May 27 2022, 12:40 AM · Restricted Project, Restricted Project

May 25 2022

simoll added a comment to D126363: [VPlan, VP] 1/4 Introduce new recipes to support predicated vectorization.

Hi Simon, did you think about making EVL a member of VPlan just like TripCount? In this case we might be not needed lots of these new classes.

Hi Alexey! The EVL behaves like a mask and less like the TripCount. When used for tail predication, the value of EVL still depends on the current vector iteration and needs to be computed in the vector loop.

But you can also treat it as an effective vector factor and use it similarly to VectorTripCount. Introducing new nodes just to add an extra operand EVL does not look necessary

I was looking at the comments we got earlier for the reference implementation. In particular @hahn 's comment on the EVL being loop-invariant when it's not used for tail predication.
The thing is, when EVL is used for tail predication you need to re-compute it in every vector loop iteration. I don't see how EVL could be handled like VectorTripCount in this case. Could you elaborate?

I have this scheme in mind:

vector.body:
  %canon.iv = phi int
  %evl = evl(%canon.iv, %vector.trip.count)
  ...
  br ...

We store the the %evl as the related value for VPValue *EVL using State.set(EVL, %evl, Part) and then get required value using State.get(EVL, Part).
In this case we can treat EVL similarly to canonical iv, which is not an invariant.

Ok. If that works then having one global EVL per State defined this way should be fine for us for now.

May 25 2022, 8:43 AM · Restricted Project, Restricted Project, Restricted Project
simoll added a comment to D126363: [VPlan, VP] 1/4 Introduce new recipes to support predicated vectorization.

Hi Simon, did you think about making EVL a member of VPlan just like TripCount? In this case we might be not needed lots of these new classes.

Hi Alexey! The EVL behaves like a mask and less like the TripCount. When used for tail predication, the value of EVL still depends on the current vector iteration and needs to be computed in the vector loop.

But you can also treat it as an effective vector factor and use it similarly to VectorTripCount. Introducing new nodes just to add an extra operand EVL does not look necessary

I was looking at the comments we got earlier for the reference implementation. In particular @hahn 's comment on the EVL being loop-invariant when it's not used for tail predication.
The thing is, when EVL is used for tail predication you need to re-compute it in every vector loop iteration. I don't see how EVL could be handled like VectorTripCount in this case. Could you elaborate?

May 25 2022, 8:06 AM · Restricted Project, Restricted Project, Restricted Project
simoll added a comment to D126363: [VPlan, VP] 1/4 Introduce new recipes to support predicated vectorization.

Hi Simon, did you think about making EVL a member of VPlan just like TripCount? In this case we might be not needed lots of these new classes.

May 25 2022, 7:22 AM · Restricted Project, Restricted Project, Restricted Project
simoll added a comment to D125296: [VP] vp intrinsics are not speculatable.

To clarify: This Diff is the second sub patch and ready for review. I changed my mind on the whole adding-another-sub-patch-and-abandoning-this-patch thing.

May 25 2022, 7:12 AM · Restricted Project, Restricted Project, Restricted Project
simoll updated the diff for D125296: [VP] vp intrinsics are not speculatable.

VP intrinsics show UB if the %evl parameter is out of bounds - they must not carry the speculatable attribute. The out-of-bounds UB disappears when the %evl parameter is expanded into the mask or expansion replaces the entire VP intrinsic with non-VP code.

May 25 2022, 7:10 AM · Restricted Project, Restricted Project, Restricted Project
simoll committed rG6e12711081d7: [VP][fix] Don't discard masks in reductions (authored by simoll).
[VP][fix] Don't discard masks in reductions
May 25 2022, 6:55 AM · Restricted Project, Restricted Project
simoll closed D126362: [VP][fix] Don't discard masks in reductions.
May 25 2022, 6:55 AM · Restricted Project, Restricted Project, Restricted Project
simoll added a reviewer for D126363: [VPlan, VP] 1/4 Introduce new recipes to support predicated vectorization: bmahjour.

This review replaces D104608 , which I am effectively commandeering as a co-author for the four original sub-patches by Vineet Kumar.

May 25 2022, 3:15 AM · Restricted Project, Restricted Project, Restricted Project
simoll requested review of D126363: [VPlan, VP] 1/4 Introduce new recipes to support predicated vectorization.
May 25 2022, 3:10 AM · Restricted Project, Restricted Project, Restricted Project
simoll added a comment to D125296: [VP] vp intrinsics are not speculatable.

Please review the first of two sub patches: https://reviews.llvm.org/D126362

May 25 2022, 2:44 AM · Restricted Project, Restricted Project, Restricted Project
simoll requested review of D126362: [VP][fix] Don't discard masks in reductions.
May 25 2022, 2:42 AM · Restricted Project, Restricted Project, Restricted Project

May 24 2022

simoll added a comment to D125296: [VP] vp intrinsics are not speculatable.

Note from syncup call: Split up into a fixup for vp reduction expansion and a second patch for the speculatable flag.

May 24 2022, 8:15 AM · Restricted Project, Restricted Project, Restricted Project
simoll added inline comments to D126273: [DAGCombiner][VP] Add DAGCombine for merging VP_FADD and VP_FMUL to VP_FMA..
May 24 2022, 6:47 AM · Restricted Project, Restricted Project
simoll added inline comments to D126273: [DAGCombiner][VP] Add DAGCombine for merging VP_FADD and VP_FMUL to VP_FMA..
May 24 2022, 6:36 AM · Restricted Project, Restricted Project

May 20 2022

Herald added a project to D93755: [VE][isel] Map EVT vectors to vector registers.: Restricted Project.

@kaz7 @efocht Does either of you want to commandeer this patch and become the patch author?

May 20 2022, 5:41 AM · Restricted Project, Restricted Project, Restricted Project
simoll abandoned D71340: [VE,#3] Runtime libraries.
May 20 2022, 5:38 AM · Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project
simoll added inline comments to D125296: [VP] vp intrinsics are not speculatable.
May 20 2022, 2:34 AM · Restricted Project, Restricted Project, Restricted Project
simoll updated the diff for D125296: [VP] vp intrinsics are not speculatable.
  • More documentation on isSafeToSpeculativelyExecuteWithOpcode
  • VP Intrinsics cleanup in Intrinsics.td
  • Rebased
  • Formatting
May 20 2022, 2:34 AM · Restricted Project, Restricted Project, Restricted Project

May 10 2022

simoll updated the diff for D125296: [VP] vp intrinsics are not speculatable.

Re-enable crossed out RUN: lines

May 10 2022, 2:46 AM · Restricted Project, Restricted Project, Restricted Project
simoll published D125296: [VP] vp intrinsics are not speculatable for review.
May 10 2022, 2:42 AM · Restricted Project, Restricted Project, Restricted Project

Mar 23 2022

simoll added inline comments to D121292: [VP] Add vp.fcmp comparison intrinsic and docs.
Mar 23 2022, 7:01 AM · Restricted Project, Restricted Project

Mar 22 2022

simoll committed rG7de383c89213: [VP] Fix VPintrinsic::getStaticVectorLength for vp.merge|select (authored by simoll).
[VP] Fix VPintrinsic::getStaticVectorLength for vp.merge|select
Mar 22 2022, 3:43 AM · Restricted Project
simoll closed D121913: [VP] Fix VPintrinsic::getStaticVectorLength for vp.merge|select.
Mar 22 2022, 3:42 AM · Restricted Project, Restricted Project, Restricted Project

Mar 21 2022

simoll edited reviewers for D120094: [CallingConv] Generate isCCArgumentReg() predicate via tablegen, added: kaz7; removed: simoll.
Mar 21 2022, 5:35 AM · Restricted Project, Restricted Project
simoll added a comment to D122042: [VP] Preserve address space of pointer for strided load/store intrinsics..

Update for vp_strided_load|store.ll tests to be fused into this patch (no attribution necessary):

Mar 21 2022, 3:23 AM · Restricted Project, Restricted Project
simoll accepted D122032: [VP] Make VectorBuilder take IRBuilderBase instead of IRBuilder<>.
Mar 21 2022, 2:17 AM · Restricted Project, Restricted Project
simoll updated the diff for D121913: [VP] Fix VPintrinsic::getStaticVectorLength for vp.merge|select.

Document that vp.merge, vp.select do not have a VP mask and why.

Mar 21 2022, 1:15 AM · Restricted Project, Restricted Project, Restricted Project
simoll added a comment to D122042: [VP] Preserve address space of pointer for strided load/store intrinsics..

Landing this will break VP strided load/store RVV and VE tests. I'll attach a diff for the VE ones that you can merge into this patch.

Mar 21 2022, 1:11 AM · Restricted Project, Restricted Project

Mar 18 2022

simoll added inline comments to D121714: [VP] fm flag transfer to SDNodes [VE] VVP_FMA fusion.
Mar 18 2022, 5:28 AM · Restricted Project, Restricted Project, Restricted Project, Restricted Project
simoll updated the diff for D121714: [VP] fm flag transfer to SDNodes [VE] VVP_FMA fusion.
  • alphabetic file order in the CMakeLists.txt
  • unchanged, just to reiterate: contract only checked on the fmul, not on the fadd.
  • formatting
Mar 18 2022, 3:22 AM · Restricted Project, Restricted Project, Restricted Project, Restricted Project

Mar 17 2022

simoll requested review of D121913: [VP] Fix VPintrinsic::getStaticVectorLength for vp.merge|select.
Mar 17 2022, 7:48 AM · Restricted Project, Restricted Project, Restricted Project