- User Since
- May 11 2015, 7:59 AM (284 w, 10 h)
How about results from the LLVM test suite?
Yeah, I'm happy to do that for you.
Thu, Oct 15
I'm sorry for not finishing what I started with this intrinsic nonsense... Any simplification of these winding paths sounds great to me.
Sorry, I didn't realise this had been updated.
Fri, Oct 9
I wasn't so much concerned about latencies, but I was guessing that by moving the instruction, the goal is to get some more tail-predication "for free".
Indeed, this greatly simplifies the effort required to ensure that we can generate the LSTP version. And I can't see the latency of DLS having any real affect on the performance of a loop, especially not compared to all the other things that can go bad!
Thu, Oct 1
Can we rename the function to something like "ClobbersPredicate", if that's what we actually care about here?
+1 from me. I also wonder renaming IncludesRemovable to something like Ignore/SkipDead?
Maybe we could use SCEV for the overflow checks? ;P
Code looks good to me and we've been running this downstream for a few years.
Removed the code that later tries to move stuff around.
Wed, Sep 30
Checking result of TryRemove.
Now using TryRemove for both LRDef cases.
Tue, Sep 29
I need to break this up and change it.
Mon, Sep 28
Thanks Dave, LGTM
Fri, Sep 25
Thu, Sep 24
Thanks. Yes, that's next on my TODO list (why would I want to do anything else?!) What I'd really like to do is simplify the logic and remove all the code that moves stuff around and just place the [D|W]LSTP as the last instruction in the preheader. We'll see how that goes...
Cheers, did some renaming and added the comment.
Sounds like a useful option to me!
Actually, I guess if you could prove that the tripcount is precisely equal to (ElementCount + VectorWidth - 1)/VectorWidth, you could also use that to prove the subtraction doesn't overflow.
This sounds like the same suggestion that I made many moons ago... I suggested taking these values and substituting them into the expected SCEV expression, and then perform some SCEV algebra on it and the vector TC expression, until hopefully they both just equal ElementCount == ElementCount. My quick prototype 'worked', but I don't know if that says much.
Wed, Sep 23
Tue, Sep 22
Rebased and added an assert.
Hey Sam, add a couple of tests please to catch when the defs are mismatched.
- Renamed VPTBlock to VPTState.
- Added a comment.
Mon, Sep 21
- All the VFP instructions now have had validForTailPredication the DomainMVE removed.
- LowOverheadLoops will inspect any instructions that are in the DomainMVE, or if they use/def the VPR. Helpers for each are added.
- LowOverheadLoops continues to reject any use of the VPR that isn't a predicate operand.
- Rebased after adding some more tests for P0 usage.
Sep 18 2020
original was only a single loop?
Do you mean a single block loop? If so, I thought the the vectorizer would perform if-conversion after calling this? And for saturation, does the vectorizer not yet consume saturating intrinsics? Are all of ours ones that merge/retain previous values, maybe their idiom could be checked for?
Sep 17 2020
Sep 16 2020
- Simplified ValidateMVEInst further by no longer differentiating between a main and secondary VCTPs.
Thanks, the logic does simplify quite a bit.
Removed all the unnecessary changes.
Cheers, now comparing against 'Parent'.
Cheers, I will definitely commit an NFC and rebase. I've extracted the validForTailPredication into another patch too: D87753.