simoll (Simon Moll)
User

Projects

User does not belong to any projects.

User Details

User Since
Jan 23 2017, 12:28 AM (94 w, 4 d)

Recent Activity

Fri, Nov 9

simoll added inline comments to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.
Fri, Nov 9, 5:15 AM

Wed, Nov 7

simoll added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

Today I took a stab at changing my RVV patches to use these intrinsics and that basically went well, affirming belief that these intrinsics are a good fit for RISC-V vectors. I stashes those changes for now rather than continuing to build on them because currently I can't match them with plain old isel patterns so I'd have to write annoying and error-prone custom lowering. That should be a temporary issue, partly due to how I don't really handle predication at the moment, partly due to a surprising extra argument on loads and stores (see inline comment).

That's great news! Thanks for trying it out. Speaking of ISel, there should probably be one new ISD node type per EVL intrinsic.

FYI I noticed the argument numbers for the new attributes don't match the actual parameters in many cases (they often seem to be off by one). No big deal, just something to keep in mind for when the RFC goes through and the patch gets submitted for real.

The patch in this RFC is a showcase version to discuss the general concept (and sort out bike shedding issues). The actual patches will be cleaner.

Wed, Nov 7, 8:41 AM
simoll updated the diff for D53613: RFC: Explicit Vector Length Intrinsics and Attributes.
  • dynamic_vl -> vlen.
  • unmasked_ret -> maskedout_ret.
  • DVL -> EVL (Explicit Vector Length).
  • Added llvm.evl.compose(%A, %B, %pivot, %mvl) intrinsic (select on lane pivot).
Wed, Nov 7, 8:04 AM

Sat, Nov 3

simoll added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.
  1. there are operations not visible in the IR (such as register copies) for which you'll probably also need this sort of analysis

Fair enough. Would it be possible to simply extend the %dvl of the defining operation to the newly created register? (instead of re-running a full fledged analysis).

At MIR level, using the semantics of RISC-V instructions, that is not generally correct: uses of the copied register can run with a different VL and therefore use lanes that wouldn't be copied by this approach.

Well, if you generate RISC-V instructions starting from EVL intrinsics then undef-on-excess still holds. So, excess lanes should be fair game for spilling. My hope is that %dvl could be annotated on MIR level like divergence is in the AMDGPU backend today. If the annotation is missing, you'd spill the full register.

Sat, Nov 3, 2:50 PM

Thu, Nov 1

simoll added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

Hi @simoll ,

Definition 1: Vector Lane Predication (VLP)

For a given operation INST that operates on vector inputs of type <{scalable }n x type>, Vector Lane Predication is the operation of attaching to INST a vector with the same number of lanes {scalable }n, but with (boolean) lanes of type i1 that selects on which lanes the operation INST needs to be executed. This concept can be applied to any IR instruction that have vectors in any of the input operand or in the result.

This can be represented in the language as an additional parameter in form of a vector of i1 lanes.

To achieve VLP, the instruction cold be extended adding the VLP parameter as the last parameter ()this would make predicated INST distinguishable from the non predicated version):

%ret = <{scalable }n x ret_type> INST(<{scalable }n x type1> %in1, <{scalable }n x type2> %in1, ..., <{scalable }n x i1> %VLP)

As you state here, predicated INSTs would be indistinguishable from unpredicated INSTs if you are unaware of predication. As a result, every existing transformation that touches vector instructions will happily ignore the predicate and break your code. In effect this is similar to using metadata to annotate the predicate.

Thu, Nov 1, 3:03 AM

Wed, Oct 31

simoll added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

Spilling only the useful prefix of each vector is important, but I don't think we need to change the IR intrinsics' semantics to enable that. I've sketched an analysis that determines the demanded/consumed vector lengths of each vector value (on MIR in SSA form). With this information the backend can do the same optimization whenever the lanes beyond VL are not ever actually observed. This information is already necessary for many reasons other than spilling, such as implementing regular full-width vector operations (i.e., pretty much everything aside from the intrinsics we discuss here) that can sneak into the IR, or even ordinary register copies (on RISC-V at least). Normally I'd be hesitant to staking such an important aspect of code quality on a best-effort analysis, but in this case it seems very feasible to have very high precision:

Actually, you could translate regular vector code to EVL intrinsics first and have your backend only work on that. This is the route we are aiming for with the SX-Aurora SVE backend. We propose undef-on-excess-lanes as the default semantics of dynamicvl. There is no special interpretation nor a change for IR intrinsics' semantics.

Ideally you'd want these intrinsics for all code, yes, but

  1. since backends don't dictate the IR pass pipeline it will be fragile/impossible to guarantee your pass for turning full vector operations into intrinsics will be last

Actually, you could use custom legalization in ISelLowering for this. No pass involved.

Wed, Oct 31, 8:14 AM
simoll added inline comments to D53695: Scalable VectorType RFC.
Wed, Oct 31, 7:11 AM
simoll added a comment to D53695: Scalable VectorType RFC.

Throwing in my 2 cents on the legalization issue. Apart from that, LGTM.

Wed, Oct 31, 7:06 AM
simoll added a comment to D53493: [DA] GPUDivergenceAnalysis for unstructured GPU kernels.

Ping. Would any subscriber volunteer to review? @nhaehnle ?

Wed, Oct 31, 6:51 AM
simoll added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

With the semantics defined in @simoll's proposal, the active vector length is actually subtly different from predication in that the former makes some lanes undef while predication takes the lane value from another parameter. I actually don't know what motivates this, in RISC-V masked-out lanes and lanes beyond VL are treated the same and this seems the most consistent choice in any ISA that has both concepts (and ISAs that only have predication would legalize the latter with predication so they too would treat all lanes the the same). Is there an architecture I'm not aware of that makes past-VL lanes undef but leave masked-out lanes undisturbed?

With the current unmasked_ret semantics, we know exactly the defined range of the result vector because all lanes beyond the dynamicvl argument are undef.
This means that the backend only needs to spill registers up to that value. This matters a lot for wide SIMD architectures like the SX-Aurora (and ARM SVE btw..) where one full vector register comes in at 256x8 byte.

Spilling only the useful prefix of each vector is important, but I don't think we need to change the IR intrinsics' semantics to enable that. I've sketched an analysis that determines the demanded/consumed vector lengths of each vector value (on MIR in SSA form). With this information the backend can do the same optimization whenever the lanes beyond VL are not ever actually observed. This information is already necessary for many reasons other than spilling, such as implementing regular full-width vector operations (i.e., pretty much everything aside from the intrinsics we discuss here) that can sneak into the IR, or even ordinary register copies (on RISC-V at least). Normally I'd be hesitant to staking such an important aspect of code quality on a best-effort analysis, but in this case it seems very feasible to have very high precision:

Actually, you could translate regular vector code to EVL intrinsics first and have your backend only work on that. This is the route we are aiming for with the SX-Aurora SVE backend. We propose undef-on-excess-lanes as the default semantics of dynamicvl. There is no special interpretation nor a change for IR intrinsics' semantics.

Wed, Oct 31, 6:20 AM
simoll added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

Hi @simoll

Here is an example of two dvl invocations, which compute the same result:

a) llvm_dvl_fadd.v256f64(%x, %y, <full mask>, 13)
Since 13 < 32, the hardware will only issue 1 operation to its SIMD execution units. The occupation is thus something like 13/32 ~ 40%.

b`) llvm_dvl_fadd.v256f64(%x, %y, <mask with first 13 bits set>, 256)
Since the DVL is 256, the hardware will issue 8 operations to its SIMD units. However, only the first 13 elements are relevant leading to an occupation of 13/256 ~ 5%.

Aha I see.

I guess we anticipate the compiler would not be able to tell that a mask corresponds to a dvl if we chose not to represent it. I can imagine this happening to a function vectorised with an arbitrary mask, does this align with your expectations too?

Thanks a lot for the clarification.

Yep, a call boundary would be the ultimate limit to the inferring-dvl-from-mask approach.

Wed, Oct 31, 4:48 AM
simoll added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

With the semantics defined in @simoll's proposal, the active vector length is actually subtly different from predication in that the former makes some lanes undef while predication takes the lane value from another parameter. I actually don't know what motivates this, in RISC-V masked-out lanes and lanes beyond VL are treated the same and this seems the most consistent choice in any ISA that has both concepts (and ISAs that only have predication would legalize the latter with predication so they too would treat all lanes the the same). Is there an architecture I'm not aware of that makes past-VL lanes undef but leave masked-out lanes undisturbed?

With the current unmasked_ret semantics, we know exactly the defined range of the result vector because all lanes beyond the dynamicvl argument are undef.
This means that the backend only needs to spill registers up to that value. This matters a lot for wide SIMD architectures like the SX-Aurora (and ARM SVE btw..) where one full vector register comes in at 256x8 byte.

Wed, Oct 31, 3:39 AM
simoll added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

The dynamic vector length is explicit because it crucially impacts the performance of vector instructions for SX-Aurora and RISC-V V (depending on hardware implementation).

Wed, Oct 31, 2:23 AM

Mon, Oct 29

simoll planned changes to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

Hi @simoll

Yes, the changes in this RFC are compatible with a physical simdlen that is unknown at compile time.

thanks, this is good to know. Apologies if I side-tracked a little bit the discussion, I was a bit concerned of seeing fixed-size vectors here and I wanted to dispel my doubts.

Regards,

Mon, Oct 29, 3:51 AM

Wed, Oct 24

simoll added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

Thanks a lot for this proposal! It's very unfortunate I couldn't be at the dev meeting to discuss in person.

This basic approach of intrinsics with both a regular mask parameter and an integer parameter for the number of lanes to process matches what I've been doing for RISC-V and it works well for that purpose, especially for the strip-mined loops @rogfer01 highlighted. I think the same approach should generalize to other architectures and to vectorized code that does not follow the strip-mining style:

  • The dynamic vector length, if not needed, can be set to the constant -1 (= UINT_MAX)
  • The mask parameter, if not needed, can be set to the constant <i1 true, i1 true, ...>

    Both of these settings are easy to recognize in legalization/ISel and in IR passes, so unpredicated dynamic-vl operations as well as predicated-but-full-length operations can be represented with (in principle) no loss of optimization power and only a little bit of boilerplate. That seems like a good trade off for not having multiple variants of every intrinsic.

Great, good to hear we are on the same page here.

Wed, Oct 24, 2:39 AM
simoll added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

Hi Roger,

Wed, Oct 24, 12:43 AM

Tue, Oct 23

simoll updated subscribers of D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

This patch is just for reference: it implements the three attributes and (some) intrinsic declarations.

Tue, Oct 23, 3:02 PM
simoll retitled D53613: RFC: Explicit Vector Length Intrinsics and Attributes from RFC: Predicated Vector Intrinsics to RFC: Dynamic Vector Length Intrinsics and Attributes.
Tue, Oct 23, 2:47 PM
simoll created D53613: RFC: Explicit Vector Length Intrinsics and Attributes.
Tue, Oct 23, 2:45 PM
simoll updated the summary of D50433: A New Divergence Analysis for LLVM.
Tue, Oct 23, 1:23 AM

Mon, Oct 22

simoll updated the diff for D50433: A New Divergence Analysis for LLVM.

Thanks for committing patch #2!

Mon, Oct 22, 6:28 AM
simoll created D53493: [DA] GPUDivergenceAnalysis for unstructured GPU kernels.
Mon, Oct 22, 6:14 AM

Wed, Oct 17

simoll added a comment to D51491: [DA] DivergenceAnalysis for unstructured, reducible CFGs.

Intel is ok with the patch as well as far as LV/VPlan is concerned (just talked to Matt Masten about this). I don't have commit rights myself. Can you commit this for me?

Wed, Oct 17, 1:10 PM

Oct 10 2018

simoll updated the diff for D50433: A New Divergence Analysis for LLVM.

NFC. Updated comments in DivergenceAnalysis.cpp.
This is in sync with (Diff 168983) of patch no. 2 (https://reviews.llvm.org/D51491).

Oct 10 2018, 4:45 AM
simoll updated the diff for D51491: [DA] DivergenceAnalysis for unstructured, reducible CFGs.

NFC. Updated comments in DivergenceAnalysis.cpp.

Oct 10 2018, 4:37 AM

Sep 18 2018

simoll updated the diff for D51491: [DA] DivergenceAnalysis for unstructured, reducible CFGs.

This diff in sync with Diff 165927 of the reference revision D50433.

Sep 18 2018, 4:10 AM
simoll added a comment to D50433: A New Divergence Analysis for LLVM.

This is git diff against git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342444

Sep 18 2018, 4:04 AM
simoll updated the diff for D50433: A New Divergence Analysis for LLVM.
  • Formatting
  • Standalone unit tests for the generic DivergenceAnalysis class w/o pass frontends (included in Patch no. 2). Unit tests include a simplified version of the diverge-switch-default test case of https://reviews.llvm.org/D52221.
Sep 18 2018, 3:59 AM
simoll added a comment to D52221: [AMDGPU] lower-switch in preISel as a workaround for legacy DA.

This workaround will not longer be required with the new Divergence Analysis (https://reviews.llvm.org/D50433). I will add this test case as a unit test for the new implementation.

Sep 18 2018, 3:27 AM

Sep 17 2018

simoll added a comment to D51491: [DA] DivergenceAnalysis for unstructured, reducible CFGs.

This patch should already add the tests for the new analysis, shouldn't it?

Sep 17 2018, 12:51 AM

Aug 31 2018

simoll added inline comments to D50433: A New Divergence Analysis for LLVM.
Aug 31 2018, 3:02 AM
simoll updated the diff for D51491: [DA] DivergenceAnalysis for unstructured, reducible CFGs.
Aug 31 2018, 1:54 AM
simoll updated the diff for D50433: A New Divergence Analysis for LLVM.
  • find() instead of count().
  • comments, typos
Aug 31 2018, 1:52 AM

Aug 30 2018

simoll added a comment to D50433: A New Divergence Analysis for LLVM.

Patch 2 is ready for review (https://reviews.llvm.org/D51491).

Aug 30 2018, 9:12 AM
simoll updated the diff for D51491: [DA] DivergenceAnalysis for unstructured, reducible CFGs.

Removed artifacts from patch #3 (LoopDivergencePrinter). This revision is in sync with the reference revision (https://reviews.llvm.org/D50433).

Aug 30 2018, 9:12 AM
simoll updated the diff for D50433: A New Divergence Analysis for LLVM.
  • Patch 1 has been upstreamed (updated diff and summary).
  • Comment formatting.
Aug 30 2018, 8:33 AM
simoll created D51491: [DA] DivergenceAnalysis for unstructured, reducible CFGs.
Aug 30 2018, 8:22 AM
simoll planned changes to D50433: A New Divergence Analysis for LLVM.

Thanks for committing patch #1 (https://reviews.llvm.org/rL341071)!
I will update this revision to reflect the outstanding changes.

Aug 30 2018, 7:36 AM

Aug 28 2018

simoll updated the diff for D50434: [NFC] Rename the DivergenceAnalysis to LegacyDivergenceAnalysis.

NFC (updated diff against git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@340809 91177308-0d34-0410-b5e6-96231b3b80d8).

Aug 28 2018, 4:57 AM

Aug 27 2018

simoll added a comment to D50433: A New Divergence Analysis for LLVM.

Sorry, i am struggeling to follow.
Do you take the union of the PDF(P) for each immediate predecessor P of X? (where X is a potential join point).
That gives you invalid results.

      A
    /   \
   B     C
 /  \   /  \
D     E     F
 \  /   \  /
   G     H
   \    /
      I

PDF(G) = {E, B}
PDF(H) = {E, C}

PDF(G) join PDF(H) = {E, B, C} (where join is set union).
Yet, there are two disjoint paths from A to I. But A is in none of these sets.

You approach to the Control Dependence Analysis considering CFG only. You operate in terms ob BBs and branches.
I start from the PHI. The idea is simple: each point where 2 values converge has already been found while building SSA and is annotated with the PHI node.
I consider not PHI parent block predecessors PDF but PHI incoming values parent blocks PDFs.

Aug 27 2018, 1:34 PM

Aug 24 2018

simoll added a comment to D50433: A New Divergence Analysis for LLVM.

For given branch all the blocks where PHI nodes must be inserted belongs to the branch parent block dominance frontier.

The problem with DF is that it implicitly assumes that there exists a definition at the function entry (see http://www.cs.utexas.edu/~pingali/CS380C/2010/papers/ssaCytron.pdf, comment below last equation of page 17).
So, we would get imprecise results.

I'm not sure that I understand you correct...
I still do not get an idea what do you mean by "imprecise results". The assumption in the paper you have referred looks reasonable.
Let's say we have 2 disjoint paths from Entry to block X and you have a use of some value V in X.

      Entry
     /   \
    A     E
  /  \     \
B     C     F  [v0 = 1]
  \  /      |
   D       /
    \ _   /
        X   v1 = PHI (1, F, undef, Entry)


      Entry__
     /       \
  A [v1 = 2]  E
 /  \         |
B     C       F  [v0 = 1]
 \  /      __/
  D       /
    \ _  /
      X v2 = PHI (1, F, 2, A)

Irrelevant of the definition in Entry, divergent branch in Entry makes the PHI in X divergent.

Each of this 2 paths should contain a definition for V. It does not matter in entry block or in one of it's successors.
You either have a normal PHI or a PHI with undef incoming value. How to handle undef depends of your decision.
You may conservatively consider any undef as divergent. This make your PHI divergent by data-dependency.

The SDA detects re-convergence points of disjoint divergent paths from a branch.
There aren't any real PHI nodes involved.
The PHI nodes are rather a vehicle to demonstrate the reduction to SSA construction.
The incoming values in the reduction are always uniform (x0 = <SomeConstant>).

Aug 24 2018, 10:28 AM
simoll added a comment to D50433: A New Divergence Analysis for LLVM.

This is git diff against git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@340606 91177308-0d34-0410-b5e6-96231b3b80d8.

Aug 24 2018, 7:04 AM
simoll updated the diff for D50433: A New Divergence Analysis for LLVM.

General

  • Doxygen comments in DA, SDA headers.
  • LLVM Coding Style: upper-case first letter in variable names, comments, ...
  • Use find() instead of operator[] or count().
  • Use BasicBlock::phis() to traverse PHI nodes in BB.
  • Spelling, typos, ..
Aug 24 2018, 7:01 AM
simoll added a comment to D50433: A New Divergence Analysis for LLVM.

When marked Done without comment, the requested changes are in the upcoming revision.

Aug 24 2018, 2:49 AM
simoll added a comment to D50433: A New Divergence Analysis for LLVM.

Also, maybe I missed it, but could you state clearly in a comment what the assumed preconditions are for correctness?

I will add comments to the class declarations of the DA and SDA - they require the CFG to be reducible.

Aug 24 2018, 2:32 AM
simoll added a comment to D50433: A New Divergence Analysis for LLVM.

When we discover the divergent control flow things tend to change. For each divergent branch we have to propagate the divergence computing the joins. For that we have to at least walk the successors until the immediate post dominator. We literally do same job as we'd done constructing the SSA form! And with the loops this is much more complicated.

Aug 24 2018, 2:04 AM

Aug 23 2018

simoll planned changes to D50433: A New Divergence Analysis for LLVM.

Thank you for the feedback! I'll update this revision shortly.

Aug 23 2018, 6:08 AM

Aug 22 2018

simoll added a comment to D50433: A New Divergence Analysis for LLVM.

Could you please generate unified diff (with git diff) and attach it to the comment?

Sure. This is git diff against (git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@340397 91177308-0d34-0410-b5e6-96231b3b80d8).

Aug 22 2018, 3:03 PM
simoll updated the diff for D50433: A New Divergence Analysis for LLVM.
Aug 22 2018, 5:52 AM

Aug 16 2018

simoll added a comment to D50433: A New Divergence Analysis for LLVM.

Ping. Are there any further remarks, change requests or questions?

Aug 16 2018, 6:29 AM
simoll added a comment to D50434: [NFC] Rename the DivergenceAnalysis to LegacyDivergenceAnalysis.

Ping. How can i help getting this committed?

Aug 16 2018, 6:27 AM

Aug 10 2018

simoll added a comment to D50433: A New Divergence Analysis for LLVM.

Yes, my approach requires pre-computation of post-dominance sets. Nevertheless, no predecessor walk is necessary. I compute the PDT sets using Cooper's "2 fingers" algorithm that uses linear walk of the post-dominator tree.
Given that the post-dominator tree is already used by the previous passes and has been constructed before the overhead is relatively small.

Ok. You still materialize the PDT sets per join point even before you know if any branch is divergent. I agree that the pre-processing cost should be negligible in the big picture. Btw, i suspect you technique could be made lazy as well.

Aug 10 2018, 5:34 AM

Aug 9 2018

simoll updated the diff for D50434: [NFC] Rename the DivergenceAnalysis to LegacyDivergenceAnalysis.

changed name KernelDivergenceAnalysis -> LegacyDivergenceAnalysis in accordance with reference revision.

Aug 9 2018, 2:09 PM
simoll updated the diff for D50433: A New Divergence Analysis for LLVM.

Changes:

  • changed new name for existing DA to LegacyDivergenceAnalysis.
  • no return after else.
  • grammar fix.
Aug 9 2018, 2:02 PM
simoll added a comment to D50433: A New Divergence Analysis for LLVM.

Thanks for sharing :) I think our approaches are more similar than you might think:

Aug 9 2018, 8:16 AM
simoll planned changes to D50434: [NFC] Rename the DivergenceAnalysis to LegacyDivergenceAnalysis.

This will be updated once all issues relating to the name change have been settled in the reference revision (https://reviews.llvm.org/D50433).

Aug 9 2018, 6:03 AM
simoll added inline comments to D50433: A New Divergence Analysis for LLVM.
Aug 9 2018, 6:00 AM

Aug 8 2018

simoll updated the diff for D50434: [NFC] Rename the DivergenceAnalysis to LegacyDivergenceAnalysis.

Renamed the include guard macros in KernelDivergenceAnalysis.h to reflect the name change from DivergenceAnalysis to KernelDivergenceAnalysis.
(LLVM_ANALYSIS_DIVERGENCE_ANALYSIS_H -> LLVM_ANALYSIS_KERNEL_DIVERGENCE_ANALYSIS_H)

Aug 8 2018, 8:14 AM
simoll added inline comments to D50433: A New Divergence Analysis for LLVM.
Aug 8 2018, 7:04 AM
simoll added a comment to D50434: [NFC] Rename the DivergenceAnalysis to LegacyDivergenceAnalysis.

I don't have commit access. Can you commit on my behalf?

Aug 8 2018, 5:30 AM
simoll updated the summary of D50433: A New Divergence Analysis for LLVM.
Aug 8 2018, 3:48 AM
simoll created D50434: [NFC] Rename the DivergenceAnalysis to LegacyDivergenceAnalysis.
Aug 8 2018, 3:46 AM
simoll created D50433: A New Divergence Analysis for LLVM.
Aug 8 2018, 3:43 AM

May 17 2018

simoll added a comment to D47006: [NFC] fix getOperandNo in PredIterator..

This fixes the following build error (c++ (GCC) 8.1.0):

May 17 2018, 1:37 AM
simoll created D47006: [NFC] fix getOperandNo in PredIterator..
May 17 2018, 1:27 AM