User Details
- User Since
- Dec 20 2015, 4:54 AM (404 w, 4 d)
Aug 8 2021
addressed clang-format issue
Aug 7 2021
Addressed tidy warnings
Aug 5 2021
rebased
Jun 28 2021
Will get back to pushing this patch in a few weeks (unfortunately didn't make it before going away on vacation).
Jun 27 2021
updated a comment, as pointed out in the review
Jun 24 2021
Thanks, Ayal! Incorporated all your comments.
Addressed review comments.
Jun 23 2021
(accidentally uploaded without context)
updated formatting
Jun 22 2021
Oct 17 2019
Aug 15 2019
Aug 13 2019
Thanks for taking a look! Please see responses below.
Aug 12 2019
Jan 31 2019
Oct 31 2018
Oct 29 2018
Comments addressed. Thanks!
Addressed comments.
Also added a test with stride 3.
Oct 24 2018
Just minor comments on the tests.
LGTM.
Oct 23 2018
Oct 21 2018
Addressed comments. See couple responses below. Thanks!
Oct 20 2018
updated to top of trunk.
Oct 19 2018
Oct 14 2018
Oct 10 2018
Comment addressed, thanks.
Oct 9 2018
Addressed Ayal's comments. Thanks!
Oct 6 2018
Thanks!
Sep 28 2018
Mar 7 2018
Hopefully I can delegate the review to Diego...
Thanks for the fix, Andrei
Jan 14 2018
LGTM. Thanks for the fix.
Dec 14 2017
Thanks so much for all your help with this work!
Dec 13 2017
Dec 12 2017
Addressed Ayal's and Silviu's comments.
Dec 10 2017
Dec 7 2017
ping
Dec 6 2017
Thanks Florian. Uploaded the formatting fix.
Hi Ayal,
Nov 30 2017
Nov 28 2017
(uploaded a fix to LoopUtils:getCastsForInductionPHI())
Hi Silviu,
Nov 22 2017
Hi Silviu,
Nov 20 2017
Hi Silviu,
I started to try out the approach you suggested, and I realized that our assumption doesn't hold... (see response to inlined comment).
Thanks,
Dorit
Nov 19 2017
Addressed Ayal's comments.
Have yet to address Silviu's comments.
Yes, IndVarSimpify wouldn't fix this issue, but I was thinking more of using the techniques there that use the SCEV expressions to find these cases instead of doing the pattern matching (see the inline comment).
Hi Silviu,
Nov 16 2017
Thanks Ayal. Incorporated your suggestions.
ping^2
Nov 15 2017
You missed just one mcpu=skylake :)
LGTM with this change
I think it would be nice to make the testcases smaller; Right now you have something like this:
for (…) {
Dst[2*i] = Dst[2*i] + Src[2*i] * k
Dst[2*i+1] = Dst[2*i+1] + Src[2*i+1] * k
}
...which actually tests both strided loads and strided stores.
So you could either use one test to check both store and load costs (and even then you probably don't need both a mul and an add just to check memops costs).
Or if you want to separate the load and store cases, the Load test could be something like:
for (…) {
s += Src[2*i]
s += Src[2*i+1]
}
The Store test could be something like:
For(…){
Dst[2*i] = k1; Dst[2*i+1] = k2;
}
Nov 7 2017
ping :)
Nov 5 2017
Incorporated Ayal's comments. Thanks!
Nov 2 2017
Hi Silviu,
Oct 18 2017
Oct 17 2017
LGTM with the last couple of comments.
AVX512 side of things now also looks good to me (with the tiny comments below).
Oct 16 2017
The AVX2 changes look ok to me now.
A couple comments about the AVX512 changes below.
Oct 11 2017
Hi Silviu,
Oct 10 2017
My only main concerns are with respect to interleave-group with gaps (see below), and the fact that we don't distinguish the AVX2 from the AVX512 case (also see below). Just minor comments beyond that.