- User Since
- Jan 26 2016, 2:17 AM (242 w, 3 d)
Wed, Sep 16
Thanks Dave, just for completeness, uploading a new diff with the codegen changes gone, which shouldn't have been there.
Thanks for that. There was a lot going on before, but this now looks like a small, nice change.
I am also not a debug expert, but this looks like an "innocent" patch to me that makes things a bit better, so that's good. What I am wondering about though why setDebugLoc isn't done in the constructor, which makes the code cleaner here and also it won't be forgotten. But I don't want to make this bigger than it is, and since I have never really looked into debug info, I also don't know if there would be any disadvantages doing that. Perhaps others can comment on that.
I understood there is a NFC and non-NFC part of this patch. Is worth separating this out?
Tue, Sep 15
Little non-urgent ping, but would be nice to get this little guy out of the way.
Mon, Sep 14
test case clean up.
Thanks for that, and agreed with your remarks. I think this is already a bit more generic/flexible and thus better than what we had, but certainly isn't fully generic. I am willing to review this once that becomes important. Then, this logic has to be moved to Scalarevolution and be made generic.
Thu, Sep 10
Cheers, comments addressed.
Wed, Sep 9
This is a (partial) rewrite of the patch after we changed the semantics of get.active.lane.mask to accept the loop tripcount as its second argument, and not the backedge-taken count. This now implements several checks to see if the tripcount belongs to this loop.
Tue, Sep 8
Rebased after pre-committing the test, of which I've changed the function name too.
Mon, Sep 7
There are some tests for 64bit reductions. We will probably want to enable inloop reductions for them in the future too, as we have the instructions. That will require a lot of costmodel improvements though.
Hi Luke, thanks for sharing your thoughts. I agree with your analysis. The in-tree vector extension that I am aware of that supports first faulting loads is Arm's SVE. While I work on Arm's MVE, I hope and think this is useful for SVE (and other targets) too, i.e. I think ffirst mask capability can be used. But since the devil is in the details here, an implementation would need to prove this. Hopefully that happens soon.
Looks good to me.
Fri, Aug 28
Before we flip the switch, can you give an impression of the performance impact of this? Does this not regress cases, is it overall a win, etc.?
Thu, Aug 27
Sorry for kind of asking the usual testing question....but was curious if there's a negative test with a pattern where its condition isn't met, so alignment < 4, if that makes sense.
Ah yes, thanks Eli!
This is reverted in rG1d8af682ef1d, and I will move this to Lint.
Wed, Aug 26
That's good amount of red/deletions!
Read this for the first time, and looks very reasonable. I have one question inline for now while I go over this again.
Thanks Dave, I will fix the spelling before committing.
I was just writing that I understood that creating a test for this one was very difficult? I.e., creating a small test case, was that this case? Looks like Sam has one now......should it not be part of this change? But anyway, it's fine I guess.
One minor addition, this change is now tested for X86, but I thought it wouldn't do any harm to test this for ARM too. If you can and want to touch the ARM backend test, I wouldn't mind if you e.g. add test @create_mask7 to llvm/test/CodeGen/Thumb2/active_lane_mask.ll, otherwise I will just add that once this lands.
LGTM. Thanks for fixing this.
Tue, Aug 25
Cheers, that's indeed a bit shorter.
Thanks for catching that.
(weird, thought I had fixed that, but cheers)
Mon, Aug 24
Yep, thanks, now with extra tests for (i32 1) and (i32 -1).
Okidoki, now with that change.
Fri, Aug 21
Thanks for looking Dave.
I am addressing your remarks, also find some replies inline.
Now with all tests fixed.