Thanks, nice one.
Sep 22 2020
Sep 21 2020
sounds like you've got your environment all setup. Would it be easy for you to quickly test the changes that you suggested earlier?
Sorry, I'm involved in slightly different activity atm. Will do whenever I can.
- pruned the test case, added comments for the different instruction categories to make the test more readable,
- fixed latencies for the FDIV and FSQRT instructions,
- regarding @evgeny777 's comment "WriteLD should be 3 cycles, not 4". I have kept it as it was, because in a first benchmark run I tried this regressed things a bit.
FYI: I have created https://reviews.llvm.org/D88017 to enable the model.
I will now look at the optimisation guide, correct the obvious mistakes, and create a diff for that, unless @evgeny777 you get there first, let me know.
I have committed a first version that we can now iterate on; it is not enabled/used yet.
We could perhaps commit it without enabling it to begin with.
@evgeny777 : how about we commit this to have a baseline that we iterate on?
Sep 17 2020
Sep 16 2020
Thanks Dave, just for completeness, uploading a new diff with the codegen changes gone, which shouldn't have been there.
Thanks for that. There was a lot going on before, but this now looks like a small, nice change.
I am also not a debug expert, but this looks like an "innocent" patch to me that makes things a bit better, so that's good. What I am wondering about though why setDebugLoc isn't done in the constructor, which makes the code cleaner here and also it won't be forgotten. But I don't want to make this bigger than it is, and since I have never really looked into debug info, I also don't know if there would be any disadvantages doing that. Perhaps others can comment on that.
I understood there is a NFC and non-NFC part of this patch. Is worth separating this out?
Sep 15 2020
Little non-urgent ping, but would be nice to get this little guy out of the way.
Sep 14 2020
test case clean up.
Thanks for that, and agreed with your remarks. I think this is already a bit more generic/flexible and thus better than what we had, but certainly isn't fully generic. I am willing to review this once that becomes important. Then, this logic has to be moved to Scalarevolution and be made generic.
Sep 10 2020
Cheers, comments addressed.
Sep 9 2020
This is a (partial) rewrite of the patch after we changed the semantics of get.active.lane.mask to accept the loop tripcount as its second argument, and not the backedge-taken count. This now implements several checks to see if the tripcount belongs to this loop.
Sep 8 2020
Rebased after pre-committing the test, of which I've changed the function name too.
Sep 7 2020
There are some tests for 64bit reductions. We will probably want to enable inloop reductions for them in the future too, as we have the instructions. That will require a lot of costmodel improvements though.
Hi Luke, thanks for sharing your thoughts. I agree with your analysis. The in-tree vector extension that I am aware of that supports first faulting loads is Arm's SVE. While I work on Arm's MVE, I hope and think this is useful for SVE (and other targets) too, i.e. I think ffirst mask capability can be used. But since the devil is in the details here, an implementation would need to prove this. Hopefully that happens soon.
Looks good to me.
Aug 28 2020
Before we flip the switch, can you give an impression of the performance impact of this? Does this not regress cases, is it overall a win, etc.?
Aug 27 2020
Sorry for kind of asking the usual testing question....but was curious if there's a negative test with a pattern where its condition isn't met, so alignment < 4, if that makes sense.
Ah yes, thanks Eli!
This is reverted in rG1d8af682ef1d, and I will move this to Lint.
Aug 26 2020
That's good amount of red/deletions!
Read this for the first time, and looks very reasonable. I have one question inline for now while I go over this again.
Thanks Dave, I will fix the spelling before committing.
I was just writing that I understood that creating a test for this one was very difficult? I.e., creating a small test case, was that this case? Looks like Sam has one now......should it not be part of this change? But anyway, it's fine I guess.
One minor addition, this change is now tested for X86, but I thought it wouldn't do any harm to test this for ARM too. If you can and want to touch the ARM backend test, I wouldn't mind if you e.g. add test @create_mask7 to llvm/test/CodeGen/Thumb2/active_lane_mask.ll, otherwise I will just add that once this lands.
LGTM. Thanks for fixing this.
This check isn't appropriate. Given that the second argument to @llvm.get.active.lane.mask.v4i1.i32 is a variable, in general, it isn't appropriate to print an error in cases where we can prove the value is 0. You can add a Lint check if you want.
Aug 25 2020
Cheers, that's indeed a bit shorter.
Thanks for catching that.
(weird, thought I had fixed that, but cheers)
Aug 24 2020
Yep, thanks, now with extra tests for (i32 1) and (i32 -1).
Okidoki, now with that change.