I've added the test cases from PR40581. Test v0 does not trigger yet, test v1 triggers. I propose adding support for v0 once we've got something in-tree.
Looks like a good fix to me.
Arg, silly! Thanks for letting me know.
With Dave's and Oliver's permission I am commandeering this because I really would like to see this getting committed soonish and I have some bandwidth to progress this.
Tue, Sep 29
If Sam has no further questions, this looks good to me.
Mon, Sep 28
Thanks Dave. With D88086 committed now, I don't think there's anything in our way anymore.
Fri, Sep 25
I am so happy that this approach works! I.e., this determines equality of TC and ElemenCount by calculating 2 scev expressions and subtracting them and testing the result for 0. Also a check for the base of the AddRec has been added now, so I think this addresses all comments.
Thu, Sep 24
I wanted to write the new checks in a separate patch as I thought it would be a new lump of code, wanted to get this clean up first out of the way, but since our last idea it is probably best to continue here. I.e., the TC == (ElemCount+VW-1) / VW is hopefully just a minor addition.
Okidoki, nice one
Thanks, perfectly clear, LGTM.
Looks good, but ignoring the nits I have one question inlined that asks about explaining why we are doing this, and am interested to have a read first.
Actually, I guess if you could prove that the tripcount is precisely equal to (ElementCount + VectorWidth - 1)/VectorWidth, you could also use that to prove the subtraction doesn't overflow.
This sounds like the same suggestion that I made many moons ago... I suggested taking these values and substituting them into the expected SCEV expression, and then perform some SCEV algebra on it and the vector TC expression, until hopefully they both just equal ElementCount == ElementCount. My quick prototype 'worked', but I don't know if that says much.
Wed, Sep 23
Thanks for looking Eli.
Tue, Sep 22
Sorry, I wrote a reply end of last week, but apparently forgot to push submit. So please see my reply inline, but I will open a new review soon, where it's probably best to continue this discussion and my reply.
Thanks, nice one.
Mon, Sep 21
sounds like you've got your environment all setup. Would it be easy for you to quickly test the changes that you suggested earlier?
Sorry, I'm involved in slightly different activity atm. Will do whenever I can.
- pruned the test case, added comments for the different instruction categories to make the test more readable,
- fixed latencies for the FDIV and FSQRT instructions,
- regarding @evgeny777 's comment "WriteLD should be 3 cycles, not 4". I have kept it as it was, because in a first benchmark run I tried this regressed things a bit.
FYI: I have created https://reviews.llvm.org/D88017 to enable the model.
I will now look at the optimisation guide, correct the obvious mistakes, and create a diff for that, unless @evgeny777 you get there first, let me know.
I have committed a first version that we can now iterate on; it is enabled/used yet.
We could perhaps commit it without enabling it to begin with.
@evgeny777 : how about we commit this to have a baseline that we iterate on?
Thu, Sep 17
Wed, Sep 16
Thanks Dave, just for completeness, uploading a new diff with the codegen changes gone, which shouldn't have been there.
Thanks for that. There was a lot going on before, but this now looks like a small, nice change.
I am also not a debug expert, but this looks like an "innocent" patch to me that makes things a bit better, so that's good. What I am wondering about though why setDebugLoc isn't done in the constructor, which makes the code cleaner here and also it won't be forgotten. But I don't want to make this bigger than it is, and since I have never really looked into debug info, I also don't know if there would be any disadvantages doing that. Perhaps others can comment on that.
I understood there is a NFC and non-NFC part of this patch. Is worth separating this out?
Tue, Sep 15
Little non-urgent ping, but would be nice to get this little guy out of the way.
Mon, Sep 14
test case clean up.
Thanks for that, and agreed with your remarks. I think this is already a bit more generic/flexible and thus better than what we had, but certainly isn't fully generic. I am willing to review this once that becomes important. Then, this logic has to be moved to Scalarevolution and be made generic.
Thu, Sep 10
Cheers, comments addressed.
Wed, Sep 9
This is a (partial) rewrite of the patch after we changed the semantics of get.active.lane.mask to accept the loop tripcount as its second argument, and not the backedge-taken count. This now implements several checks to see if the tripcount belongs to this loop.
Tue, Sep 8
Rebased after pre-committing the test, of which I've changed the function name too.
Mon, Sep 7
There are some tests for 64bit reductions. We will probably want to enable inloop reductions for them in the future too, as we have the instructions. That will require a lot of costmodel improvements though.
Hi Luke, thanks for sharing your thoughts. I agree with your analysis. The in-tree vector extension that I am aware of that supports first faulting loads is Arm's SVE. While I work on Arm's MVE, I hope and think this is useful for SVE (and other targets) too, i.e. I think ffirst mask capability can be used. But since the devil is in the details here, an implementation would need to prove this. Hopefully that happens soon.
Looks good to me.
Aug 28 2020
Before we flip the switch, can you give an impression of the performance impact of this? Does this not regress cases, is it overall a win, etc.?
Aug 27 2020
Sorry for kind of asking the usual testing question....but was curious if there's a negative test with a pattern where its condition isn't met, so alignment < 4, if that makes sense.
Ah yes, thanks Eli!
This is reverted in rG1d8af682ef1d, and I will move this to Lint.