Page MenuHomePhabricator

simoll (Simon Moll)
User

Projects

User does not belong to any projects.

User Details

User Since
Jan 23 2017, 12:28 AM (113 w, 19 h)

Recent Activity

Tue, Mar 19

simoll updated the diff for D57504: RFC: Prototype & Roadmap for vector predication in LLVM.
  • re-based onto master
Tue, Mar 19, 12:12 AM · Restricted Project

Mon, Mar 18

simoll added a comment to D58736: [System Model] Introduce a target system model.

This takes a while to digest. Some quick remarks for now (also inline):

  • Is there a way to query the number of (automatic) HW prefetchers?
  • Does the interface provide the latency of each cache level (hit)/memory (miss)?
Mon, Mar 18, 9:57 PM · Restricted Project
simoll added a comment to D59042: [SDA] Bug fix: Use IPD outside the loop as divergence bound.

Ping. This is a bug fix for the SDA.

Mon, Mar 18, 8:53 PM · Restricted Project
simoll retitled D59042: [SDA] Bug fix: Use IPD outside the loop as divergence bound from [SDA] Use IPD outside the loop as divergence bound to [SDA] Bug fix: Use IPD outside the loop as divergence bound.
Mon, Mar 18, 8:52 PM · Restricted Project

Fri, Mar 8

simoll updated subscribers of D59042: [SDA] Bug fix: Use IPD outside the loop as divergence bound.

Adding more DA users as subscribers

Fri, Mar 8, 5:45 AM · Restricted Project

Wed, Mar 6

simoll created D59042: [SDA] Bug fix: Use IPD outside the loop as divergence bound.
Wed, Mar 6, 12:52 PM · Restricted Project

Feb 13 2019

simoll updated the diff for D57504: RFC: Prototype & Roadmap for vector predication in LLVM.

Renamed EVL to VP

Feb 13 2019, 7:39 AM · Restricted Project

Feb 11 2019

simoll added a comment to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.

Documenting this here: Constrained EVL intrinsics will be necessary for trapping fp ops (https://lists.llvm.org/pipermail/llvm-dev/2019-January/129806.html).

Feb 11 2019, 4:16 AM · Restricted Project

Feb 6 2019

simoll updated the summary of D50433: A New Divergence Analysis for LLVM.
Feb 6 2019, 1:41 PM · Restricted Project
simoll planned changes to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.
Feb 6 2019, 1:38 PM · Restricted Project
simoll added a comment to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.

Certainly, that's why I say it's secondary and mostly a hint that something is amiss with "the current thinking". In fact, I am by now inclined to propose that Jacob and collaborators start out by expressing their architecture's operations with target-specific intrinsics that also use the attributes introduced here (especially since none of the typical vectorizers are equipped to generate the sort of code they want from scalar code using e.g. float4 types).

just so you know: we're not using a "typical vectoriser", we are writing our own, and it is very very specifically designed to accommodate the float3 and float4 datatypes. it is a SPIR-V to LLVM-IR compiler (used by both OpenCL and 3D Shaders compatible with the Vulkan API). to reiterate: we are *not* expecting to vector-augment or even use clang-llvm, rustlang-llvm or other vectoriser, we are *directly* translating SPIR-V IR into LLVM IR.

Err.. re-vectorizing float3/float4 codes will mostly concern the vectorizer backend ("widening phase"), all other stages should accept vectors as "scalar" data types. I am talking about RV here, of course (https://github.com/cdl-saarland/rv) ;-)

Feb 6 2019, 12:49 PM · Restricted Project
simoll added a comment to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.

This seems shaky. When generalized to scalable vector types, it means a load of a scalable vector would be evl.gather(<1 x double*> %p, <scalable n x i1>), which mixes fixed and scaled vector sizes. While it's no big deal to test the divisibility, allowing "mixed scalability" increases the surface area of the feature and not in a direction that seems desirable. For example, it strongly suggests permitting evl.add(<scalable n x i32>, <scalable n x i32>, <n x i1>, ...) where each mask bit controls vscale many lanes -- quite unnatural, and not something that seems likely to ever be put into hardware.

Mixing vector types and scalable vector types is illegal and is not what i was suggesting. Rather, a scalar pointer would be passed to convey a consecutive load/store from a single address.

Ok, sorry, I misunderstood the proposal then. That seems reasonable, although I remain unsure about the benefits.

And what for? I see technical disadvantages (less readable IR, needing more finicky pattern matching in the backend, more complexity in IR passes that work better on loads than on general gathers) and few if any technical advantages. It's a little conceptual simplification, but only at the level of abstraction where you don't care about uniformity.

Less readable IR: the address computation would become simpler, eg there is no need to synthesize a consecutive constant only to have it pattern-matched and subsumed in the backend (eg <0, 1, 2, ..., 15, 0, 1, 2, 3.., 15, 0, ....>).

I don't really follow, here we were only talking about subsuming unit-stride loads under gathers (and likewise for stores/scatters), right? But anyway, I was mostly worried about a dummy 1-element vector and the extra type parameter(s) on the intrinsic, which isn't an issue with what you actually proposed.

Finicky pattern matching: it is trivial to legalize this by expanding it into a more standard gather/scatter, or splitting it into consecutive memory accesses.. we can even keep the EVL_LOAD, EVL_STORE SDNodes in the backend so you woudn't even realize that an llvm.evl.gather was used for a consecutive load on IR level.

I'm not talking about legalizing a unit-stride access into a gather/scatter (when would that be necessary? everyone who has scatter/gather also has unit-stride), but about recognizing that the "gather" or "scatter" is actually unit strided. Having separate SDNodes would solve that by handling it once for all targets in SelectionDAG construction/combines.

More complexity on IR passes for standard loads: We are already using intrinsics here and i'd rather advocate to push for a solution that makes general passes work on predicated gather/scatter (which is not the case atm, afaik).

I expect that quite a few optimizations and analyses that work for plain old loads don't work for gathers, or have to work substantially harder to work on gathers, maybe even with reduced effectiveness. I do want scatters and gathers to be optimized well, too, but I fear we'll instead end up with a pile of "do one thing if this gather is a unit-stride access, else do something different [or nothing]".

How does this plane interact with the later stages of the roadmap? At stage 5, a {Load,Store}Inst with vlen and mask is a unit-stride access, and gathers are left out in the rain, unless you first generalize load and store instructions to general gathers and scatters (which seems a bit radical).

But again, we can leave this out of the first version and keep discussing as an extension.

Yeah, this is a question of what's the best canonical form for unit-stride memory accesses, that can be debated when the those actually exist in tree.

I was referring to the generalized gather, for example:

Feb 6 2019, 4:57 AM · Restricted Project

Feb 5 2019

simoll added a comment to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.

We will also need to adjust gather/scatter and possibly other load/store kinds to allow the address vector length to be a divisor of the main vector length (similar to mask vector length). I didn't check if there are intrinsics for strided load/store, those will need to be changed too, to allow, for example, storing <scalable 3 x float> to var.v in:

.. and as a side effect evl_load/evl_store are subsumed by evl_gather/evl_scatter:

evl.load(%p, %M, %L) ==  evl.gather(<1 x double*> %p, <256 x i1>..) ==  evl.gather(double* %p, <256 x i1> %M, i32 %L)

This seems shaky. When generalized to scalable vector types, it means a load of a scalable vector would be evl.gather(<1 x double*> %p, <scalable n x i1>), which mixes fixed and scaled vector sizes. While it's no big deal to test the divisibility, allowing "mixed scalability" increases the surface area of the feature and not in a direction that seems desirable. For example, it strongly suggests permitting evl.add(<scalable n x i32>, <scalable n x i32>, <n x i1>, ...) where each mask bit controls vscale many lanes -- quite unnatural, and not something that seems likely to ever be put into hardware.

Mixing vector types and scalable vector types is illegal and is not what i was suggesting. Rather, a scalar pointer would be passed to convey a consecutive load/store from a single address.

Feb 5 2019, 2:36 AM · Restricted Project
simoll added inline comments to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.
Feb 5 2019, 2:22 AM · Restricted Project

Feb 4 2019

simoll added inline comments to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.
Feb 4 2019, 1:39 PM · Restricted Project

Feb 1 2019

simoll added inline comments to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.
Feb 1 2019, 11:27 AM · Restricted Project
simoll added a comment to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.

We will also need to adjust gather/scatter and possibly other load/store kinds to allow the address vector length to be a divisor of the main vector length (similar to mask vector length). I didn't check if there are intrinsics for strided load/store, those will need to be changed too, to allow, for example, storing <scalable 3 x float> to var.v in:

.. and as a side effect evl_load/evl_store are subsumed by evl_gather/evl_scatter:

Feb 1 2019, 5:59 AM · Restricted Project
simoll added inline comments to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.
Feb 1 2019, 2:02 AM · Restricted Project

Jan 31 2019

simoll accepted D53695: Scalable VectorType RFC.
Jan 31 2019, 6:49 AM
simoll added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

I've opened a new RFC for a roadmap for vector predication (and a more up-to-date EVL prototype) - https://reviews.llvm.org/D57504 .

Jan 31 2019, 3:16 AM
simoll created D57504: RFC: Prototype & Roadmap for vector predication in LLVM.
Jan 31 2019, 3:13 AM · Restricted Project

Jan 23 2019

simoll updated the diff for D53613: RFC: Explicit Vector Length Intrinsics and Attributes.
  • FMA fusion! DAGCombiner lifted to work on EVL SDNodes as well as on regular SDNodes.
  • Native EVL SDNodes on ISel level.
  • Various fixes: gather/scatter cleanup, canonicalized reduction intrinsics, issues in TableGen's intrinsic generator code, ..
Jan 23 2019, 8:38 AM

Jan 18 2019

simoll updated the diff for D53613: RFC: Explicit Vector Length Intrinsics and Attributes.
  • EVL intrinsics no longer use the passthru attribute. An explicit select should be used to obtain defined vector elements where the mask in the intrinsic was false. passthru is still useful for general functions as in call @foo.
  • The %passthru argument of llvm.evl.gather in favor of a select-based pattern as above.
  • DAGBuilder integration (llvm.evl.fadd -> evl_fadd SDNode).
  • EVLBuilder convenience builder for EVL intrinsics, allows direct mapping from scalar instructions to EVL intrinsics.
Jan 18 2019, 7:48 AM

Jan 8 2019

simoll added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

In the general case (call @foo), the result on masked-off lanes is just unknown.

Would declaring the elements undef have any advantages? A use of undef is known to be, well, undefined behavior and the optimizer can take advantage of that. If the value is simply "unknown" I don't think the same can be done.

Well, there is a clear downside to declaring masked-off return lanes undef by default:
Say, a user defines a function @foo that takes a mask and produces a well-defined result on masked-off lanes. With undef on masked-off lanes, the user is not able to use the mask attribute for the mask argument.
That means that @foo is precluded from calling conventions that require an annotated mask (which may exist at some point).

Jan 8 2019, 8:05 AM

Dec 21 2018

simoll added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

I understand why BinaryOperator inherits from PredicatedBinaryOperator (so existing code looking at BinaryOperators won't optimize predicated code it doesn't know about). It's little mind-bending as it's not really an is-a relationship anymore. Is there code that might look at Instructions and have the same problem? That is, could code that looks at Instruction illegally optimize predicated code it's not aware of?

It is actually very much an is-a relationship because an unpredicated operator is an (optionally) predicated operator that never has a predicate (so it`s a proper subset functionality wise).
It's still just an intrinsic so i do not see how transformations that only look at Instruction and don't dig deeper could break the EVL intrinsic call.

Dec 21 2018, 10:14 PM
simoll updated the diff for D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

Changes:

  • maskedout_ret -> passthru.
  • removed legacy alignment argument from scatter/gather.
  • fixed some attribute placements in EVL intrinsics.
Dec 21 2018, 10:26 AM
simoll added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

@simoll
Sorry, I'm joining this conversation late.

No worries. You are here now :)

Dec 21 2018, 9:59 AM

Nov 26 2018

simoll added a comment to D53493: [DA] GPUDivergenceAnalysis for unstructured GPU kernels.

LGTM. Do you need me to commit this?

Yes, i cannot commit myself.

Nov 26 2018, 12:15 AM

Nov 9 2018

simoll added inline comments to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.
Nov 9 2018, 5:15 AM

Nov 7 2018

simoll added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

Today I took a stab at changing my RVV patches to use these intrinsics and that basically went well, affirming belief that these intrinsics are a good fit for RISC-V vectors. I stashes those changes for now rather than continuing to build on them because currently I can't match them with plain old isel patterns so I'd have to write annoying and error-prone custom lowering. That should be a temporary issue, partly due to how I don't really handle predication at the moment, partly due to a surprising extra argument on loads and stores (see inline comment).

That's great news! Thanks for trying it out. Speaking of ISel, there should probably be one new ISD node type per EVL intrinsic.

FYI I noticed the argument numbers for the new attributes don't match the actual parameters in many cases (they often seem to be off by one). No big deal, just something to keep in mind for when the RFC goes through and the patch gets submitted for real.

The patch in this RFC is a showcase version to discuss the general concept (and sort out bike shedding issues). The actual patches will be cleaner.

Nov 7 2018, 8:41 AM
simoll updated the diff for D53613: RFC: Explicit Vector Length Intrinsics and Attributes.
  • dynamic_vl -> vlen.
  • unmasked_ret -> maskedout_ret.
  • DVL -> EVL (Explicit Vector Length).
  • Added llvm.evl.compose(%A, %B, %pivot, %mvl) intrinsic (select on lane pivot).
Nov 7 2018, 8:04 AM

Nov 3 2018

simoll added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.
  1. there are operations not visible in the IR (such as register copies) for which you'll probably also need this sort of analysis

Fair enough. Would it be possible to simply extend the %dvl of the defining operation to the newly created register? (instead of re-running a full fledged analysis).

At MIR level, using the semantics of RISC-V instructions, that is not generally correct: uses of the copied register can run with a different VL and therefore use lanes that wouldn't be copied by this approach.

Well, if you generate RISC-V instructions starting from EVL intrinsics then undef-on-excess still holds. So, excess lanes should be fair game for spilling. My hope is that %dvl could be annotated on MIR level like divergence is in the AMDGPU backend today. If the annotation is missing, you'd spill the full register.

Nov 3 2018, 2:50 PM

Nov 1 2018

simoll added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

Hi @simoll ,

Definition 1: Vector Lane Predication (VLP)

For a given operation INST that operates on vector inputs of type <{scalable }n x type>, Vector Lane Predication is the operation of attaching to INST a vector with the same number of lanes {scalable }n, but with (boolean) lanes of type i1 that selects on which lanes the operation INST needs to be executed. This concept can be applied to any IR instruction that have vectors in any of the input operand or in the result.

This can be represented in the language as an additional parameter in form of a vector of i1 lanes.

To achieve VLP, the instruction cold be extended adding the VLP parameter as the last parameter ()this would make predicated INST distinguishable from the non predicated version):

%ret = <{scalable }n x ret_type> INST(<{scalable }n x type1> %in1, <{scalable }n x type2> %in1, ..., <{scalable }n x i1> %VLP)

As you state here, predicated INSTs would be indistinguishable from unpredicated INSTs if you are unaware of predication. As a result, every existing transformation that touches vector instructions will happily ignore the predicate and break your code. In effect this is similar to using metadata to annotate the predicate.

Nov 1 2018, 3:03 AM

Oct 31 2018

simoll added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

Spilling only the useful prefix of each vector is important, but I don't think we need to change the IR intrinsics' semantics to enable that. I've sketched an analysis that determines the demanded/consumed vector lengths of each vector value (on MIR in SSA form). With this information the backend can do the same optimization whenever the lanes beyond VL are not ever actually observed. This information is already necessary for many reasons other than spilling, such as implementing regular full-width vector operations (i.e., pretty much everything aside from the intrinsics we discuss here) that can sneak into the IR, or even ordinary register copies (on RISC-V at least). Normally I'd be hesitant to staking such an important aspect of code quality on a best-effort analysis, but in this case it seems very feasible to have very high precision:

Actually, you could translate regular vector code to EVL intrinsics first and have your backend only work on that. This is the route we are aiming for with the SX-Aurora SVE backend. We propose undef-on-excess-lanes as the default semantics of dynamicvl. There is no special interpretation nor a change for IR intrinsics' semantics.

Ideally you'd want these intrinsics for all code, yes, but

  1. since backends don't dictate the IR pass pipeline it will be fragile/impossible to guarantee your pass for turning full vector operations into intrinsics will be last

Actually, you could use custom legalization in ISelLowering for this. No pass involved.

Oct 31 2018, 8:14 AM
simoll added inline comments to D53695: Scalable VectorType RFC.
Oct 31 2018, 7:11 AM
simoll added a comment to D53695: Scalable VectorType RFC.

Throwing in my 2 cents on the legalization issue. Apart from that, LGTM.

Oct 31 2018, 7:06 AM
simoll added a comment to D53493: [DA] GPUDivergenceAnalysis for unstructured GPU kernels.

Ping. Would any subscriber volunteer to review? @nhaehnle ?

Oct 31 2018, 6:51 AM
simoll added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

With the semantics defined in @simoll's proposal, the active vector length is actually subtly different from predication in that the former makes some lanes undef while predication takes the lane value from another parameter. I actually don't know what motivates this, in RISC-V masked-out lanes and lanes beyond VL are treated the same and this seems the most consistent choice in any ISA that has both concepts (and ISAs that only have predication would legalize the latter with predication so they too would treat all lanes the the same). Is there an architecture I'm not aware of that makes past-VL lanes undef but leave masked-out lanes undisturbed?

With the current unmasked_ret semantics, we know exactly the defined range of the result vector because all lanes beyond the dynamicvl argument are undef.
This means that the backend only needs to spill registers up to that value. This matters a lot for wide SIMD architectures like the SX-Aurora (and ARM SVE btw..) where one full vector register comes in at 256x8 byte.

Spilling only the useful prefix of each vector is important, but I don't think we need to change the IR intrinsics' semantics to enable that. I've sketched an analysis that determines the demanded/consumed vector lengths of each vector value (on MIR in SSA form). With this information the backend can do the same optimization whenever the lanes beyond VL are not ever actually observed. This information is already necessary for many reasons other than spilling, such as implementing regular full-width vector operations (i.e., pretty much everything aside from the intrinsics we discuss here) that can sneak into the IR, or even ordinary register copies (on RISC-V at least). Normally I'd be hesitant to staking such an important aspect of code quality on a best-effort analysis, but in this case it seems very feasible to have very high precision:

Actually, you could translate regular vector code to EVL intrinsics first and have your backend only work on that. This is the route we are aiming for with the SX-Aurora SVE backend. We propose undef-on-excess-lanes as the default semantics of dynamicvl. There is no special interpretation nor a change for IR intrinsics' semantics.

Oct 31 2018, 6:20 AM
simoll added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

Hi @simoll

Here is an example of two dvl invocations, which compute the same result:

a) llvm_dvl_fadd.v256f64(%x, %y, <full mask>, 13)
Since 13 < 32, the hardware will only issue 1 operation to its SIMD execution units. The occupation is thus something like 13/32 ~ 40%.

b`) llvm_dvl_fadd.v256f64(%x, %y, <mask with first 13 bits set>, 256)
Since the DVL is 256, the hardware will issue 8 operations to its SIMD units. However, only the first 13 elements are relevant leading to an occupation of 13/256 ~ 5%.

Aha I see.

I guess we anticipate the compiler would not be able to tell that a mask corresponds to a dvl if we chose not to represent it. I can imagine this happening to a function vectorised with an arbitrary mask, does this align with your expectations too?

Thanks a lot for the clarification.

Yep, a call boundary would be the ultimate limit to the inferring-dvl-from-mask approach.

Oct 31 2018, 4:48 AM
simoll added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

With the semantics defined in @simoll's proposal, the active vector length is actually subtly different from predication in that the former makes some lanes undef while predication takes the lane value from another parameter. I actually don't know what motivates this, in RISC-V masked-out lanes and lanes beyond VL are treated the same and this seems the most consistent choice in any ISA that has both concepts (and ISAs that only have predication would legalize the latter with predication so they too would treat all lanes the the same). Is there an architecture I'm not aware of that makes past-VL lanes undef but leave masked-out lanes undisturbed?

With the current unmasked_ret semantics, we know exactly the defined range of the result vector because all lanes beyond the dynamicvl argument are undef.
This means that the backend only needs to spill registers up to that value. This matters a lot for wide SIMD architectures like the SX-Aurora (and ARM SVE btw..) where one full vector register comes in at 256x8 byte.

Oct 31 2018, 3:39 AM
simoll added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

The dynamic vector length is explicit because it crucially impacts the performance of vector instructions for SX-Aurora and RISC-V V (depending on hardware implementation).

Oct 31 2018, 2:23 AM

Oct 29 2018

simoll planned changes to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

Hi @simoll

Yes, the changes in this RFC are compatible with a physical simdlen that is unknown at compile time.

thanks, this is good to know. Apologies if I side-tracked a little bit the discussion, I was a bit concerned of seeing fixed-size vectors here and I wanted to dispel my doubts.

Regards,

Oct 29 2018, 3:51 AM

Oct 24 2018

simoll added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

Thanks a lot for this proposal! It's very unfortunate I couldn't be at the dev meeting to discuss in person.

This basic approach of intrinsics with both a regular mask parameter and an integer parameter for the number of lanes to process matches what I've been doing for RISC-V and it works well for that purpose, especially for the strip-mined loops @rogfer01 highlighted. I think the same approach should generalize to other architectures and to vectorized code that does not follow the strip-mining style:

  • The dynamic vector length, if not needed, can be set to the constant -1 (= UINT_MAX)
  • The mask parameter, if not needed, can be set to the constant <i1 true, i1 true, ...>

    Both of these settings are easy to recognize in legalization/ISel and in IR passes, so unpredicated dynamic-vl operations as well as predicated-but-full-length operations can be represented with (in principle) no loss of optimization power and only a little bit of boilerplate. That seems like a good trade off for not having multiple variants of every intrinsic.

Great, good to hear we are on the same page here.

Oct 24 2018, 2:39 AM
simoll added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

Hi Roger,

Oct 24 2018, 12:43 AM

Oct 23 2018

simoll updated subscribers of D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

This patch is just for reference: it implements the three attributes and (some) intrinsic declarations.

Oct 23 2018, 3:02 PM
simoll retitled D53613: RFC: Explicit Vector Length Intrinsics and Attributes from RFC: Predicated Vector Intrinsics to RFC: Dynamic Vector Length Intrinsics and Attributes.
Oct 23 2018, 2:47 PM
simoll created D53613: RFC: Explicit Vector Length Intrinsics and Attributes.
Oct 23 2018, 2:45 PM
simoll updated the summary of D50433: A New Divergence Analysis for LLVM.
Oct 23 2018, 1:23 AM · Restricted Project

Oct 22 2018

simoll updated the diff for D50433: A New Divergence Analysis for LLVM.

Thanks for committing patch #2!

Oct 22 2018, 6:28 AM · Restricted Project
simoll created D53493: [DA] GPUDivergenceAnalysis for unstructured GPU kernels.
Oct 22 2018, 6:14 AM

Oct 17 2018

simoll added a comment to D51491: [DA] DivergenceAnalysis for unstructured, reducible CFGs.

Intel is ok with the patch as well as far as LV/VPlan is concerned (just talked to Matt Masten about this). I don't have commit rights myself. Can you commit this for me?

Oct 17 2018, 1:10 PM

Oct 10 2018

simoll updated the diff for D50433: A New Divergence Analysis for LLVM.

NFC. Updated comments in DivergenceAnalysis.cpp.
This is in sync with (Diff 168983) of patch no. 2 (https://reviews.llvm.org/D51491).

Oct 10 2018, 4:45 AM · Restricted Project
simoll updated the diff for D51491: [DA] DivergenceAnalysis for unstructured, reducible CFGs.

NFC. Updated comments in DivergenceAnalysis.cpp.

Oct 10 2018, 4:37 AM

Sep 18 2018

simoll updated the diff for D51491: [DA] DivergenceAnalysis for unstructured, reducible CFGs.

This diff in sync with Diff 165927 of the reference revision D50433.

Sep 18 2018, 4:10 AM
simoll added a comment to D50433: A New Divergence Analysis for LLVM.

This is git diff against git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342444

Sep 18 2018, 4:04 AM · Restricted Project
simoll updated the diff for D50433: A New Divergence Analysis for LLVM.
  • Formatting
  • Standalone unit tests for the generic DivergenceAnalysis class w/o pass frontends (included in Patch no. 2). Unit tests include a simplified version of the diverge-switch-default test case of https://reviews.llvm.org/D52221.
Sep 18 2018, 3:59 AM · Restricted Project
simoll added a comment to D52221: [AMDGPU] lower-switch in preISel as a workaround for legacy DA.

This workaround will not longer be required with the new Divergence Analysis (https://reviews.llvm.org/D50433). I will add this test case as a unit test for the new implementation.

Sep 18 2018, 3:27 AM

Sep 17 2018

simoll added a comment to D51491: [DA] DivergenceAnalysis for unstructured, reducible CFGs.

This patch should already add the tests for the new analysis, shouldn't it?

Sep 17 2018, 12:51 AM

Aug 31 2018

simoll added inline comments to D50433: A New Divergence Analysis for LLVM.
Aug 31 2018, 3:02 AM · Restricted Project
simoll updated the diff for D51491: [DA] DivergenceAnalysis for unstructured, reducible CFGs.
Aug 31 2018, 1:54 AM
simoll updated the diff for D50433: A New Divergence Analysis for LLVM.
  • find() instead of count().
  • comments, typos
Aug 31 2018, 1:52 AM · Restricted Project

Aug 30 2018

simoll added a comment to D50433: A New Divergence Analysis for LLVM.

Patch 2 is ready for review (https://reviews.llvm.org/D51491).

Aug 30 2018, 9:12 AM · Restricted Project
simoll updated the diff for D51491: [DA] DivergenceAnalysis for unstructured, reducible CFGs.

Removed artifacts from patch #3 (LoopDivergencePrinter). This revision is in sync with the reference revision (https://reviews.llvm.org/D50433).

Aug 30 2018, 9:12 AM
simoll updated the diff for D50433: A New Divergence Analysis for LLVM.
  • Patch 1 has been upstreamed (updated diff and summary).
  • Comment formatting.
Aug 30 2018, 8:33 AM · Restricted Project
simoll created D51491: [DA] DivergenceAnalysis for unstructured, reducible CFGs.
Aug 30 2018, 8:22 AM
simoll planned changes to D50433: A New Divergence Analysis for LLVM.

Thanks for committing patch #1 (https://reviews.llvm.org/rL341071)!
I will update this revision to reflect the outstanding changes.

Aug 30 2018, 7:36 AM · Restricted Project

Aug 28 2018

simoll updated the diff for D50434: [NFC] Rename the DivergenceAnalysis to LegacyDivergenceAnalysis.

NFC (updated diff against git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@340809 91177308-0d34-0410-b5e6-96231b3b80d8).

Aug 28 2018, 4:57 AM

Aug 27 2018

simoll added a comment to D50433: A New Divergence Analysis for LLVM.

Sorry, i am struggeling to follow.
Do you take the union of the PDF(P) for each immediate predecessor P of X? (where X is a potential join point).
That gives you invalid results.

      A
    /   \
   B     C
 /  \   /  \
D     E     F
 \  /   \  /
   G     H
   \    /
      I

PDF(G) = {E, B}
PDF(H) = {E, C}

PDF(G) join PDF(H) = {E, B, C} (where join is set union).
Yet, there are two disjoint paths from A to I. But A is in none of these sets.

You approach to the Control Dependence Analysis considering CFG only. You operate in terms ob BBs and branches.
I start from the PHI. The idea is simple: each point where 2 values converge has already been found while building SSA and is annotated with the PHI node.
I consider not PHI parent block predecessors PDF but PHI incoming values parent blocks PDFs.

Aug 27 2018, 1:34 PM · Restricted Project

Aug 24 2018

simoll added a comment to D50433: A New Divergence Analysis for LLVM.

For given branch all the blocks where PHI nodes must be inserted belongs to the branch parent block dominance frontier.

The problem with DF is that it implicitly assumes that there exists a definition at the function entry (see http://www.cs.utexas.edu/~pingali/CS380C/2010/papers/ssaCytron.pdf, comment below last equation of page 17).
So, we would get imprecise results.

I'm not sure that I understand you correct...
I still do not get an idea what do you mean by "imprecise results". The assumption in the paper you have referred looks reasonable.
Let's say we have 2 disjoint paths from Entry to block X and you have a use of some value V in X.

      Entry
     /   \
    A     E
  /  \     \
B     C     F  [v0 = 1]
  \  /      |
   D       /
    \ _   /
        X   v1 = PHI (1, F, undef, Entry)


      Entry__
     /       \
  A [v1 = 2]  E
 /  \         |
B     C       F  [v0 = 1]
 \  /      __/
  D       /
    \ _  /
      X v2 = PHI (1, F, 2, A)

Irrelevant of the definition in Entry, divergent branch in Entry makes the PHI in X divergent.

Each of this 2 paths should contain a definition for V. It does not matter in entry block or in one of it's successors.
You either have a normal PHI or a PHI with undef incoming value. How to handle undef depends of your decision.
You may conservatively consider any undef as divergent. This make your PHI divergent by data-dependency.

The SDA detects re-convergence points of disjoint divergent paths from a branch.
There aren't any real PHI nodes involved.
The PHI nodes are rather a vehicle to demonstrate the reduction to SSA construction.
The incoming values in the reduction are always uniform (x0 = <SomeConstant>).

Aug 24 2018, 10:28 AM · Restricted Project
simoll added a comment to D50433: A New Divergence Analysis for LLVM.

This is git diff against git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@340606 91177308-0d34-0410-b5e6-96231b3b80d8.

Aug 24 2018, 7:04 AM · Restricted Project
simoll updated the diff for D50433: A New Divergence Analysis for LLVM.

General

  • Doxygen comments in DA, SDA headers.
  • LLVM Coding Style: upper-case first letter in variable names, comments, ...
  • Use find() instead of operator[] or count().
  • Use BasicBlock::phis() to traverse PHI nodes in BB.
  • Spelling, typos, ..
Aug 24 2018, 7:01 AM · Restricted Project
simoll added a comment to D50433: A New Divergence Analysis for LLVM.

When marked Done without comment, the requested changes are in the upcoming revision.

Aug 24 2018, 2:49 AM · Restricted Project
simoll added a comment to D50433: A New Divergence Analysis for LLVM.

Also, maybe I missed it, but could you state clearly in a comment what the assumed preconditions are for correctness?

I will add comments to the class declarations of the DA and SDA - they require the CFG to be reducible.

Aug 24 2018, 2:32 AM · Restricted Project
simoll added a comment to D50433: A New Divergence Analysis for LLVM.

When we discover the divergent control flow things tend to change. For each divergent branch we have to propagate the divergence computing the joins. For that we have to at least walk the successors until the immediate post dominator. We literally do same job as we'd done constructing the SSA form! And with the loops this is much more complicated.

Aug 24 2018, 2:04 AM · Restricted Project

Aug 23 2018

simoll planned changes to D50433: A New Divergence Analysis for LLVM.

Thank you for the feedback! I'll update this revision shortly.

Aug 23 2018, 6:08 AM · Restricted Project

Aug 22 2018

simoll added a comment to D50433: A New Divergence Analysis for LLVM.

Could you please generate unified diff (with git diff) and attach it to the comment?

Sure. This is git diff against (git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@340397 91177308-0d34-0410-b5e6-96231b3b80d8).

Aug 22 2018, 3:03 PM · Restricted Project
simoll updated the diff for D50433: A New Divergence Analysis for LLVM.
Aug 22 2018, 5:52 AM · Restricted Project

Aug 16 2018

simoll added a comment to D50433: A New Divergence Analysis for LLVM.

Ping. Are there any further remarks, change requests or questions?

Aug 16 2018, 6:29 AM · Restricted Project
simoll added a comment to D50434: [NFC] Rename the DivergenceAnalysis to LegacyDivergenceAnalysis.

Ping. How can i help getting this committed?

Aug 16 2018, 6:27 AM

Aug 10 2018

simoll added a comment to D50433: A New Divergence Analysis for LLVM.

Yes, my approach requires pre-computation of post-dominance sets. Nevertheless, no predecessor walk is necessary. I compute the PDT sets using Cooper's "2 fingers" algorithm that uses linear walk of the post-dominator tree.
Given that the post-dominator tree is already used by the previous passes and has been constructed before the overhead is relatively small.

Ok. You still materialize the PDT sets per join point even before you know if any branch is divergent. I agree that the pre-processing cost should be negligible in the big picture. Btw, i suspect you technique could be made lazy as well.

Aug 10 2018, 5:34 AM · Restricted Project

Aug 9 2018

simoll updated the diff for D50434: [NFC] Rename the DivergenceAnalysis to LegacyDivergenceAnalysis.

changed name KernelDivergenceAnalysis -> LegacyDivergenceAnalysis in accordance with reference revision.

Aug 9 2018, 2:09 PM
simoll updated the diff for D50433: A New Divergence Analysis for LLVM.

Changes:

  • changed new name for existing DA to LegacyDivergenceAnalysis.
  • no return after else.
  • grammar fix.
Aug 9 2018, 2:02 PM · Restricted Project
simoll added a comment to D50433: A New Divergence Analysis for LLVM.

Thanks for sharing :) I think our approaches are more similar than you might think:

Aug 9 2018, 8:16 AM · Restricted Project
simoll planned changes to D50434: [NFC] Rename the DivergenceAnalysis to LegacyDivergenceAnalysis.

This will be updated once all issues relating to the name change have been settled in the reference revision (https://reviews.llvm.org/D50433).

Aug 9 2018, 6:03 AM
simoll added inline comments to D50433: A New Divergence Analysis for LLVM.
Aug 9 2018, 6:00 AM · Restricted Project

Aug 8 2018

simoll updated the diff for D50434: [NFC] Rename the DivergenceAnalysis to LegacyDivergenceAnalysis.

Renamed the include guard macros in KernelDivergenceAnalysis.h to reflect the name change from DivergenceAnalysis to KernelDivergenceAnalysis.
(LLVM_ANALYSIS_DIVERGENCE_ANALYSIS_H -> LLVM_ANALYSIS_KERNEL_DIVERGENCE_ANALYSIS_H)

Aug 8 2018, 8:14 AM
simoll added inline comments to D50433: A New Divergence Analysis for LLVM.
Aug 8 2018, 7:04 AM · Restricted Project
simoll added a comment to D50434: [NFC] Rename the DivergenceAnalysis to LegacyDivergenceAnalysis.

I don't have commit access. Can you commit on my behalf?

Aug 8 2018, 5:30 AM
simoll updated the summary of D50433: A New Divergence Analysis for LLVM.
Aug 8 2018, 3:48 AM · Restricted Project
simoll created D50434: [NFC] Rename the DivergenceAnalysis to LegacyDivergenceAnalysis.
Aug 8 2018, 3:46 AM
simoll created D50433: A New Divergence Analysis for LLVM.
Aug 8 2018, 3:43 AM · Restricted Project

May 17 2018

simoll added a comment to D47006: [NFC] fix getOperandNo in PredIterator..

This fixes the following build error (c++ (GCC) 8.1.0):

May 17 2018, 1:37 AM
simoll created D47006: [NFC] fix getOperandNo in PredIterator..
May 17 2018, 1:27 AM