- User Since
- Jan 23 2017, 12:28 AM (139 w, 2 h)
Tue, Sep 17
@foad Sorry for the delay. Feel free to commit on my behalf (i have been granted commit access but haven't found the time to set it up yet).
Thu, Sep 12
One minor thing: VFABI parsing should succeed for any well-formed AVFBI vector function name (beyond what's listed in the VFISAKind enum). This would open up the functionality for external users.
Aug 23 2019
Merged commits in preparation for landing
Aug 7 2019
This is a "Keepalive" message - I will get back working on LLVM-VP in October.
Jul 26 2019
Added b42473 r1.ll as a test case (failing assert on missing def at loop header).
Jul 25 2019
May 15 2019
Apr 18 2019
Thanks for your feedback!
Apr 17 2019
According to the LLVM langref, "fpexcept.ignore" seems to be the right option for exceptions whereas there is no "round.permissive" option for the rounding behavior. Abusing rmInvalid/ebInvalid seems hacky.
Apr 16 2019
- added constrained fp intrinsics (IR level only).
- initial support for mapping llvm.experimental.constrained.* intrinsics to llvm.vp.constrained.*.
NFC. Stripped empty lines.
Mar 19 2019
- re-based onto master
Mar 18 2019
This takes a while to digest. Some quick remarks for now (also inline):
- Is there a way to query the number of (automatic) HW prefetchers?
- Does the interface provide the latency of each cache level (hit)/memory (miss)?
Ping. This is a bug fix for the SDA.
Mar 8 2019
Adding more DA users as subscribers
Mar 6 2019
Feb 13 2019
Renamed EVL to VP
Feb 11 2019
Documenting this here: Constrained EVL intrinsics will be necessary for trapping fp ops (https://lists.llvm.org/pipermail/llvm-dev/2019-January/129806.html).
Feb 6 2019
- EVL -> VP (for vector predication). That is llvm.vp.fsub and vp_fsub (https://reviews.llvm.org/D57504#inline-509343).
- The unit of the vlen parameter is the vector element type (to clarify: the interpretation of vlen is unaffacted by the vscale of scalable vector types) (https://reviews.llvm.org/D57504#1387621)
Err.. re-vectorizing float3/float4 codes will mostly concern the vectorizer backend ("widening phase"), all other stages should accept vectors as "scalar" data types. I am talking about RV here, of course (https://github.com/cdl-saarland/rv) ;-)
I was referring to the generalized gather, for example:
Feb 5 2019
Mixing vector types and scalable vector types is illegal and is not what i was suggesting. Rather, a scalar pointer would be passed to convey a consecutive load/store from a single address.
Feb 4 2019
Feb 1 2019
.. and as a side effect evl_load/evl_store are subsumed by evl_gather/evl_scatter:
Jan 31 2019
I've opened a new RFC for a roadmap for vector predication (and a more up-to-date EVL prototype) - https://reviews.llvm.org/D57504 .
Jan 23 2019
- FMA fusion! DAGCombiner lifted to work on EVL SDNodes as well as on regular SDNodes.
- Native EVL SDNodes on ISel level.
- Various fixes: gather/scatter cleanup, canonicalized reduction intrinsics, issues in TableGen's intrinsic generator code, ..
Jan 18 2019
- EVL intrinsics no longer use the passthru attribute. An explicit select should be used to obtain defined vector elements where the mask in the intrinsic was false. passthru is still useful for general functions as in call @foo.
- The %passthru argument of llvm.evl.gather in favor of a select-based pattern as above.
- DAGBuilder integration (llvm.evl.fadd -> evl_fadd SDNode).
- EVLBuilder convenience builder for EVL intrinsics, allows direct mapping from scalar instructions to EVL intrinsics.
Jan 8 2019
Well, there is a clear downside to declaring masked-off return lanes undef by default:
Say, a user defines a function @foo that takes a mask and produces a well-defined result on masked-off lanes. With undef on masked-off lanes, the user is not able to use the mask attribute for the mask argument.
That means that @foo is precluded from calling conventions that require an annotated mask (which may exist at some point).
Dec 21 2018
It is actually very much an is-a relationship because an unpredicated operator is an (optionally) predicated operator that never has a predicate (so it`s a proper subset functionality wise).
It's still just an intrinsic so i do not see how transformations that only look at Instruction and don't dig deeper could break the EVL intrinsic call.
- maskedout_ret -> passthru.
- removed legacy alignment argument from scatter/gather.
- fixed some attribute placements in EVL intrinsics.
No worries. You are here now :)
Nov 26 2018
Yes, i cannot commit myself.
Nov 9 2018
Nov 7 2018
That's great news! Thanks for trying it out. Speaking of ISel, there should probably be one new ISD node type per EVL intrinsic.
FYI I noticed the argument numbers for the new attributes don't match the actual parameters in many cases (they often seem to be off by one). No big deal, just something to keep in mind for when the RFC goes through and the patch gets submitted for real.
The patch in this RFC is a showcase version to discuss the general concept (and sort out bike shedding issues). The actual patches will be cleaner.
- dynamic_vl -> vlen.
- unmasked_ret -> maskedout_ret.
- DVL -> EVL (Explicit Vector Length).
- Added llvm.evl.compose(%A, %B, %pivot, %mvl) intrinsic (select on lane pivot).
Nov 3 2018
Well, if you generate RISC-V instructions starting from EVL intrinsics then undef-on-excess still holds. So, excess lanes should be fair game for spilling. My hope is that %dvl could be annotated on MIR level like divergence is in the AMDGPU backend today. If the annotation is missing, you'd spill the full register.
Nov 1 2018
As you state here, predicated INSTs would be indistinguishable from unpredicated INSTs if you are unaware of predication. As a result, every existing transformation that touches vector instructions will happily ignore the predicate and break your code. In effect this is similar to using metadata to annotate the predicate.
Oct 31 2018
Actually, you could use custom legalization in ISelLowering for this. No pass involved.
Throwing in my 2 cents on the legalization issue. Apart from that, LGTM.
Ping. Would any subscriber volunteer to review? @nhaehnle ?
Actually, you could translate regular vector code to EVL intrinsics first and have your backend only work on that. This is the route we are aiming for with the SX-Aurora SVE backend. We propose undef-on-excess-lanes as the default semantics of dynamicvl. There is no special interpretation nor a change for IR intrinsics' semantics.
Yep, a call boundary would be the ultimate limit to the inferring-dvl-from-mask approach.
With the current unmasked_ret semantics, we know exactly the defined range of the result vector because all lanes beyond the dynamicvl argument are undef.
This means that the backend only needs to spill registers up to that value. This matters a lot for wide SIMD architectures like the SX-Aurora (and ARM SVE btw..) where one full vector register comes in at 256x8 byte.
The dynamic vector length is explicit because it crucially impacts the performance of vector instructions for SX-Aurora and RISC-V V (depending on hardware implementation).
Oct 29 2018
Oct 24 2018
Great, good to hear we are on the same page here.
Oct 23 2018
This patch is just for reference: it implements the three attributes and (some) intrinsic declarations.
Oct 22 2018
Thanks for committing patch #2!
Oct 17 2018
Intel is ok with the patch as well as far as LV/VPlan is concerned (just talked to Matt Masten about this). I don't have commit rights myself. Can you commit this for me?
Oct 10 2018
NFC. Updated comments in DivergenceAnalysis.cpp.
This is in sync with (Diff 168983) of patch no. 2 (https://reviews.llvm.org/D51491).
NFC. Updated comments in DivergenceAnalysis.cpp.
Sep 18 2018
This diff in sync with Diff 165927 of the reference revision D50433.
This is git diff against git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342444
- Standalone unit tests for the generic DivergenceAnalysis class w/o pass frontends (included in Patch no. 2). Unit tests include a simplified version of the diverge-switch-default test case of https://reviews.llvm.org/D52221.
This workaround will not longer be required with the new Divergence Analysis (https://reviews.llvm.org/D50433). I will add this test case as a unit test for the new implementation.
Sep 17 2018
Aug 31 2018
- find() instead of count().
- typos, comments.
- this is in sync with diff Diff 163482 of the reference revision (https://reviews.llvm.org/D50433).
- find() instead of count().
- comments, typos
Aug 30 2018
Patch 2 is ready for review (https://reviews.llvm.org/D51491).
Removed artifacts from patch #3 (LoopDivergencePrinter). This revision is in sync with the reference revision (https://reviews.llvm.org/D50433).
- Patch 1 has been upstreamed (updated diff and summary).
- Comment formatting.
Thanks for committing patch #1 (https://reviews.llvm.org/rL341071)!
I will update this revision to reflect the outstanding changes.
Aug 28 2018
NFC (updated diff against git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@340809 91177308-0d34-0410-b5e6-96231b3b80d8).
Aug 27 2018
Aug 24 2018
The SDA detects re-convergence points of disjoint divergent paths from a branch.
There aren't any real PHI nodes involved.
The PHI nodes are rather a vehicle to demonstrate the reduction to SSA construction.
The incoming values in the reduction are always uniform (x0 = <SomeConstant>).
This is git diff against git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@340606 91177308-0d34-0410-b5e6-96231b3b80d8.
- Doxygen comments in DA, SDA headers.
- LLVM Coding Style: upper-case first letter in variable names, comments, ...
- Use find() instead of operator or count().
- Use BasicBlock::phis() to traverse PHI nodes in BB.
- Spelling, typos, ..
When marked Done without comment, the requested changes are in the upcoming revision.
I will add comments to the class declarations of the DA and SDA - they require the CFG to be reducible.
Aug 23 2018
Thank you for the feedback! I'll update this revision shortly.
Aug 22 2018
Sure. This is git diff against (git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@340397 91177308-0d34-0410-b5e6-96231b3b80d8).
- Rebased on current trunk (git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@340397 91177308-0d34-0410-b5e6-96231b3b80d8).
- Fixed divergent loop example in source code (lib/Analysis/DivergenceAnalysis.cpp : 168)
Aug 16 2018
Ping. Are there any further remarks, change requests or questions?
Ping. How can i help getting this committed?
Aug 10 2018
Ok. You still materialize the PDT sets per join point even before you know if any branch is divergent. I agree that the pre-processing cost should be negligible in the big picture. Btw, i suspect you technique could be made lazy as well.
Aug 9 2018
changed name KernelDivergenceAnalysis -> LegacyDivergenceAnalysis in accordance with reference revision.
- changed new name for existing DA to LegacyDivergenceAnalysis.
- no return after else.
- grammar fix.
Thanks for sharing :) I think our approaches are more similar than you might think:
This will be updated once all issues relating to the name change have been settled in the reference revision (https://reviews.llvm.org/D50433).
Aug 8 2018
Renamed the include guard macros in KernelDivergenceAnalysis.h to reflect the name change from DivergenceAnalysis to KernelDivergenceAnalysis.
(LLVM_ANALYSIS_DIVERGENCE_ANALYSIS_H -> LLVM_ANALYSIS_KERNEL_DIVERGENCE_ANALYSIS_H)