Page MenuHomePhabricator

ashahid (Shahid)Email Not Verified
User

Projects

User does not belong to any projects.

User Details

User Since
Apr 14 2015, 8:24 AM (235 w, 17 h)

Recent Activity

Mon, Oct 7

ashahid closed D36130: [SLP] Vectorize jumbled memory loads..
Mon, Oct 7, 5:02 AM · Restricted Project

Mar 6 2018

ashahid added inline comments to D36130: [SLP] Vectorize jumbled memory loads..
Mar 6 2018, 9:09 PM · Restricted Project
ashahid added inline comments to D36130: [SLP] Vectorize jumbled memory loads..
Mar 6 2018, 8:22 AM · Restricted Project

Mar 5 2018

ashahid added inline comments to D36130: [SLP] Vectorize jumbled memory loads..
Mar 5 2018, 10:40 PM · Restricted Project

Feb 28 2018

ashahid added inline comments to D36130: [SLP] Vectorize jumbled memory loads..
Feb 28 2018, 11:31 PM · Restricted Project
ashahid added a comment to D36130: [SLP] Vectorize jumbled memory loads..

Hope this is fine.

Feb 28 2018, 9:35 AM · Restricted Project
ashahid updated the diff for D36130: [SLP] Vectorize jumbled memory loads..

Updated further review comments.

Feb 28 2018, 9:30 AM · Restricted Project

Feb 27 2018

ashahid committed rL326303: [SLP] Added new tests and updated existing for jumbled load, NFC..
[SLP] Added new tests and updated existing for jumbled load, NFC.
Feb 27 2018, 8:23 PM
ashahid added a comment to D36130: [SLP] Vectorize jumbled memory loads..

Will commit the tests as NFC.

Feb 27 2018, 8:26 AM · Restricted Project
ashahid added a comment to D36130: [SLP] Vectorize jumbled memory loads..

As suggested, now the reordering mask will be part of each tree entry. Also this update does not consider to optimize the reordered load for multiple operand for now.

Feb 27 2018, 6:41 AM · Restricted Project
ashahid updated the diff for D36130: [SLP] Vectorize jumbled memory loads..

Updated the patch to accomodate the review comments.

Feb 27 2018, 6:29 AM · Restricted Project

Feb 16 2018

ashahid added inline comments to D36130: [SLP] Vectorize jumbled memory loads..
Feb 16 2018, 9:46 AM · Restricted Project

Feb 14 2018

ashahid added a comment to D36130: [SLP] Vectorize jumbled memory loads..

Hi Alexey,

Feb 14 2018, 2:38 AM · Restricted Project
ashahid updated the diff for D36130: [SLP] Vectorize jumbled memory loads..

Minor clean up.

Feb 14 2018, 2:34 AM · Restricted Project
ashahid updated the diff for D36130: [SLP] Vectorize jumbled memory loads..

Updates review comments and a test case.

Feb 14 2018, 1:38 AM · Restricted Project

Feb 13 2018

ashahid added a comment to D36130: [SLP] Vectorize jumbled memory loads..

Hi Alexey,

Feb 13 2018, 6:47 AM · Restricted Project
ashahid added a comment to D36130: [SLP] Vectorize jumbled memory loads..

Hi Alexey,

Feb 13 2018, 1:51 AM · Restricted Project

Feb 10 2018

ashahid added a comment to D36130: [SLP] Vectorize jumbled memory loads..

Hi Ayal, Sanjoy,

Feb 10 2018, 8:35 AM · Restricted Project

Jan 26 2018

ashahid added a comment to D36130: [SLP] Vectorize jumbled memory loads..

Ping!

Jan 26 2018, 7:54 AM · Restricted Project

Jan 16 2018

ashahid updated the diff for D36130: [SLP] Vectorize jumbled memory loads..

Updates test case and stylistic review comments

Jan 16 2018, 8:51 AM · Restricted Project

Jan 15 2018

ashahid added inline comments to D36130: [SLP] Vectorize jumbled memory loads..
Jan 15 2018, 9:01 PM · Restricted Project

Jan 11 2018

ashahid added a comment to D36130: [SLP] Vectorize jumbled memory loads..
In D36130#971181, @Ayal wrote:

This should fix the case observed by @sanjoy in http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20171218/511721.html; please also include a testcase.

Jan 11 2018, 8:24 AM · Restricted Project

Jan 8 2018

ashahid added a comment to D36130: [SLP] Vectorize jumbled memory loads..

Ping!

Jan 8 2018, 6:46 PM · Restricted Project

Jan 1 2018

ashahid updated the diff for D36130: [SLP] Vectorize jumbled memory loads..

Updated Ayal's comment accordingly

Jan 1 2018, 7:59 AM · Restricted Project

Dec 29 2017

ashahid added a comment to D36130: [SLP] Vectorize jumbled memory loads..

Regression test and LNT passes, 3 stage bootstrap test underway.

Dec 29 2017, 12:10 AM · Restricted Project

Dec 28 2017

ashahid updated the diff for D36130: [SLP] Vectorize jumbled memory loads..

Updates review comments.

Dec 28 2017, 11:04 PM · Restricted Project
ashahid reopened D36130: [SLP] Vectorize jumbled memory loads..
Dec 28 2017, 11:04 PM · Restricted Project

Dec 22 2017

ashahid added inline comments to D36130: [SLP] Vectorize jumbled memory loads..
Dec 22 2017, 6:20 AM · Restricted Project

Dec 20 2017

ashahid committed rL321181: Revert r320548:[SLP] Vectorize jumbled memory loads.
Revert r320548:[SLP] Vectorize jumbled memory loads
Dec 20 2017, 7:28 AM

Dec 19 2017

ashahid added a comment to D41324: [SLPVectorizer] Add shuffle instruction cost for jumbled load.
In D41324#959193, @Ayal wrote:

Presumably this fixes the reported regressions?

Cf. the test added in D40883 to check if a cost is accounted for (or not).

This test was meant for Loop Vectorizer, do you want me to check this for SLP or I may be missing something.

Dec 19 2017, 8:42 AM

Dec 18 2017

ashahid added a comment to D41324: [SLPVectorizer] Add shuffle instruction cost for jumbled load.

Lit test done, LNT test underway.

Dec 18 2017, 9:16 AM
ashahid updated the diff for D41324: [SLPVectorizer] Add shuffle instruction cost for jumbled load.

Review comment updated accordingly.

Dec 18 2017, 8:54 AM
ashahid added inline comments to D41324: [SLPVectorizer] Add shuffle instruction cost for jumbled load.
Dec 18 2017, 8:03 AM
ashahid added a comment to D41324: [SLPVectorizer] Add shuffle instruction cost for jumbled load.

Regression and LNT test was successful. Bootstrap test is underway.

Hi Shahid,

Do you mean the LNT test code is back to the code before the patch D36130?

Thanks,
Evgeny

Dec 18 2017, 7:53 AM

Dec 17 2017

ashahid added a comment to D41324: [SLPVectorizer] Add shuffle instruction cost for jumbled load.

Ping!

Dec 17 2017, 6:52 PM

Dec 16 2017

ashahid added a comment to D41324: [SLPVectorizer] Add shuffle instruction cost for jumbled load.

Regression and LNT test was successful. Bootstrap test is underway.

Dec 16 2017, 9:30 AM
ashahid updated the summary of D41324: [SLPVectorizer] Add shuffle instruction cost for jumbled load.
Dec 16 2017, 9:28 AM
ashahid updated the summary of D41324: [SLPVectorizer] Add shuffle instruction cost for jumbled load.
Dec 16 2017, 9:25 AM
ashahid created D41324: [SLPVectorizer] Add shuffle instruction cost for jumbled load.
Dec 16 2017, 9:22 AM

Dec 12 2017

ashahid committed rL320548: [SLP] Vectorize jumbled memory loads..
[SLP] Vectorize jumbled memory loads.
Dec 12 2017, 7:09 PM
ashahid closed D36130: [SLP] Vectorize jumbled memory loads..
Dec 12 2017, 7:09 PM · Restricted Project
ashahid added a comment to D36130: [SLP] Vectorize jumbled memory loads..

Bootstrap and LNT test underway.

Dec 12 2017, 10:13 AM · Restricted Project
ashahid updated the diff for D36130: [SLP] Vectorize jumbled memory loads..

Updated test and review comment.

Dec 12 2017, 10:01 AM · Restricted Project

Dec 11 2017

ashahid added inline comments to D36130: [SLP] Vectorize jumbled memory loads..
Dec 11 2017, 8:41 AM · Restricted Project
ashahid updated the diff for D36130: [SLP] Vectorize jumbled memory loads..

Review comments updated and added lit tests.

Dec 11 2017, 8:34 AM · Restricted Project
ashahid updated the diff for D36130: [SLP] Vectorize jumbled memory loads..

Review comments updated and added lit tests.

Dec 11 2017, 7:47 AM · Restricted Project

Dec 9 2017

ashahid updated the diff for D36130: [SLP] Vectorize jumbled memory loads..

Minor commented code clean up done.

Dec 9 2017, 12:56 AM · Restricted Project
ashahid updated the diff for D36130: [SLP] Vectorize jumbled memory loads..

Updated the review comments.

Dec 9 2017, 12:50 AM · Restricted Project

Dec 8 2017

ashahid added inline comments to D36130: [SLP] Vectorize jumbled memory loads..
Dec 8 2017, 8:12 AM · Restricted Project

Dec 6 2017

ashahid added a comment to D36130: [SLP] Vectorize jumbled memory loads..
In D36130#945728, @Ayal wrote:

Good catch. Add a LIT test?

Dec 6 2017, 1:17 AM · Restricted Project
ashahid added a comment to D36130: [SLP] Vectorize jumbled memory loads..
In D36130#945306, @hans wrote:

Patch update for fixing build bot failure:

I haven't looked at the patch at all, but I just tried it on a local Chrome build on Linux, and it seems to work for that.

Dec 6 2017, 1:02 AM · Restricted Project

Dec 4 2017

ashahid updated the diff for D36130: [SLP] Vectorize jumbled memory loads..

Patch update for fixing build bot failure:

Dec 4 2017, 9:17 PM · Restricted Project

Oct 3 2017

ashahid reopened D36130: [SLP] Vectorize jumbled memory loads..

Opening as buid bot broken again... its embarrassing:(

Oct 3 2017, 9:51 AM · Restricted Project
ashahid committed rL314806: [SLP] Vectorize jumbled memory loads..
[SLP] Vectorize jumbled memory loads.
Oct 3 2017, 8:30 AM
ashahid closed D36130: [SLP] Vectorize jumbled memory loads. by committing rL314806: [SLP] Vectorize jumbled memory loads..
Oct 3 2017, 8:30 AM · Restricted Project

Oct 2 2017

ashahid added inline comments to D36130: [SLP] Vectorize jumbled memory loads..
Oct 2 2017, 2:17 AM · Restricted Project

Oct 1 2017

ashahid added a comment to D36130: [SLP] Vectorize jumbled memory loads..

The regression test result is as follows. There are 2 failures coming from Clang however these failures are also observed without this patch.

Oct 1 2017, 1:49 AM · Restricted Project
ashahid updated the diff for D36130: [SLP] Vectorize jumbled memory loads..

Updated the patch to fix the assertion 'Extracting from gather list'.

Oct 1 2017, 1:40 AM · Restricted Project

Sep 28 2017

ashahid added a comment to D36130: [SLP] Vectorize jumbled memory loads..

PING!!

Sep 28 2017, 12:23 AM · Restricted Project

Sep 26 2017

ashahid reopened D36130: [SLP] Vectorize jumbled memory loads..

This patch has been reverted in revision r313771 due to an assertion from the test "test/Transforms/SLPVectorizer/AArch64/gather-root.ll" as I missed to enable AArch64 target.

Sep 26 2017, 3:29 PM · Restricted Project
ashahid updated subscribers of D36130: [SLP] Vectorize jumbled memory loads..
Sep 26 2017, 3:29 PM · Restricted Project

Sep 20 2017

ashahid committed rL313771: [SLP] Vectorize jumbled memory loads..
[SLP] Vectorize jumbled memory loads.
Sep 20 2017, 10:21 AM
ashahid committed rL313736: [SLP] Vectorize jumbled memory loads..
[SLP] Vectorize jumbled memory loads.
Sep 20 2017, 1:20 AM
ashahid closed D36130: [SLP] Vectorize jumbled memory loads. by committing rL313736: [SLP] Vectorize jumbled memory loads..
Sep 20 2017, 1:20 AM · Restricted Project

Sep 19 2017

ashahid added a comment to D36130: [SLP] Vectorize jumbled memory loads..

Fixed the issue to consider the respective operand while recording the mask for user tree entry. Added the test which you have used to show this issue.

Sep 19 2017, 12:41 AM · Restricted Project
ashahid updated the diff for D36130: [SLP] Vectorize jumbled memory loads..
  • Review comments updated accordingly
  • Added a TODO for sortLoadAccesses API
  • Modified the TODO for sortLoadAccesses API
  • Review comment update for using OpdNum to insert the mask in respective location
Sep 19 2017, 12:34 AM · Restricted Project

Sep 10 2017

ashahid added a comment to D36130: [SLP] Vectorize jumbled memory loads..
In D36130#863237, @Ayal wrote:

Yes, thanks, the example is clear. I agree that having OpdNums allows to represent such cases where two shuffles from the same load are to feed two distinct operands of the same user. But the SLP vectorizer with this patch alone will not optimize such patterns, right? Take for example jumbled-load.ll above; to match the case depicted in the pdf, replace the zeroes used as the 2nd operands of the cmp's and/or of the select's by a shuffle (same as that of the 1st operand or different).

Yes, that's right. However right now I am not clear what needs to be done to optimize those pattern. I also tried another simple case, where 2 different Shuffle from same LOAD is fed into
a MUL. Here, for VF=4, SLP reports "Scalar used twice in bundle" and removes redundant scalar operations instead of vectorization. Consequently, the STOREs of the results of the MULs does not get vectorized. May be we need to do a trade-off between scalar vs vector code here.

Sep 10 2017, 11:41 PM · Restricted Project

Sep 5 2017

ashahid added a comment to D36130: [SLP] Vectorize jumbled memory loads..

Sep 5 2017, 10:31 PM · Restricted Project

Sep 3 2017

ashahid added inline comments to D36130: [SLP] Vectorize jumbled memory loads..
Sep 3 2017, 9:48 PM · Restricted Project

Aug 31 2017

ashahid added inline comments to D36130: [SLP] Vectorize jumbled memory loads..
Aug 31 2017, 9:58 PM · Restricted Project

Aug 29 2017

ashahid added inline comments to D36130: [SLP] Vectorize jumbled memory loads..
Aug 29 2017, 7:34 PM · Restricted Project

Aug 27 2017

ashahid added a comment to D36130: [SLP] Vectorize jumbled memory loads..

Updated the review comments accordingly

Aug 27 2017, 11:32 PM · Restricted Project
ashahid updated the diff for D36130: [SLP] Vectorize jumbled memory loads..
  • Modified the TODO for sortLoadAccesses API
Aug 27 2017, 11:13 PM · Restricted Project
ashahid updated the diff for D36130: [SLP] Vectorize jumbled memory loads..
  • Added a TODO for sortLoadAccesses API
Aug 27 2017, 11:01 PM · Restricted Project
ashahid updated the diff for D36130: [SLP] Vectorize jumbled memory loads..
  • Review comments updated accordingly
Aug 27 2017, 10:47 PM · Restricted Project

Aug 22 2017

ashahid added a comment to D36130: [SLP] Vectorize jumbled memory loads..
In D36130#848516, @Ayal wrote:
In D36130#847002, @Ayal wrote:

sortMemAccesses() is analogous to the formation of InterleaveGroups in the LoopVectorizer, which also scans a collection of Loads (or Stores) to determine if they are adjacent in some order and can be combined into one Vector Load of a given width; and if so, in what order. This requires a single scan to compute the distances relative to the first access, as done here. But knowing that we're looking for a permutation of a given width, we can more easily sort the accesses as they are entered into a map, holding the minimum and maximum indices. See insertMember() there.

Thanks Ayal for your comments.
As this is a long pending work and my current focus is to add this feature in SLP. I welcome your suggestions and would like to consider in a separate patch after this.

OK, you can mention this improvement in a TODO comment for now.

Aug 22 2017, 9:00 AM · Restricted Project

Aug 21 2017

ashahid added a comment to D36130: [SLP] Vectorize jumbled memory loads..
In D36130#847002, @Ayal wrote:

sortMemAccesses() is analogous to the formation of InterleaveGroups in the LoopVectorizer, which also scans a collection of Loads (or Stores) to determine if they are adjacent in some order and can be combined into one Vector Load of a given width; and if so, in what order. This requires a single scan to compute the distances relative to the first access, as done here. But knowing that we're looking for a permutation of a given width, we can more easily sort the accesses as they are entered into a map, holding the minimum and maximum indices. See insertMember() there.

The alternative of representing such Shuffles as TreeNodes by themselves, rather than as side-effects of their Loads' TreeNodes, would probably requires a more abstract model of the SLP Tree; they are not an ordinary packing of existing Instructions.

Additional minor comments inline.

Aug 21 2017, 10:00 AM · Restricted Project

Aug 9 2017

ashahid updated the summary of D36130: [SLP] Vectorize jumbled memory loads..
Aug 9 2017, 9:14 AM · Restricted Project

Aug 1 2017

ashahid added reviewers for D36130: [SLP] Vectorize jumbled memory loads.: mkuper, loladiro.
Aug 1 2017, 12:38 AM · Restricted Project
ashahid created D36130: [SLP] Vectorize jumbled memory loads..
Aug 1 2017, 12:31 AM · Restricted Project

Jul 31 2017

ashahid committed rL309544: [SLP]: Add test to resurrect the jumbled load patch. This test has multiple uses.
[SLP]: Add test to resurrect the jumbled load patch. This test has multiple uses
Jul 31 2017, 12:41 AM

Mar 3 2017

ashahid committed rL296863: [SLP] Fixes the bug due to absence of in order uses of scalars which needs to….
[SLP] Fixes the bug due to absence of in order uses of scalars which needs to…
Mar 3 2017, 2:15 AM
ashahid closed D30159: [SLP] Fix for a bug in jumbled memory access vectorization introduced in r293386 by committing rL296863: [SLP] Fixes the bug due to absence of in order uses of scalars which needs to….
Mar 3 2017, 2:14 AM
ashahid added inline comments to D30159: [SLP] Fix for a bug in jumbled memory access vectorization introduced in r293386.
Mar 3 2017, 1:22 AM

Mar 2 2017

ashahid updated the diff for D30159: [SLP] Fix for a bug in jumbled memory access vectorization introduced in r293386.

Updated the patch to do a proper lane computation for external uses.

Mar 2 2017, 9:23 PM

Feb 28 2017

ashahid committed rL296575: [SLP] Fixes the bug due to absence of in order uses of scalars which needs to….
[SLP] Fixes the bug due to absence of in order uses of scalars which needs to…
Feb 28 2017, 8:03 PM
ashahid closed D30159: [SLP] Fix for a bug in jumbled memory access vectorization introduced in r293386 by committing rL296575: [SLP] Fixes the bug due to absence of in order uses of scalars which needs to….
Feb 28 2017, 8:03 PM
ashahid added inline comments to D30159: [SLP] Fix for a bug in jumbled memory access vectorization introduced in r293386.
Feb 28 2017, 2:33 AM

Feb 27 2017

ashahid updated the diff for D30159: [SLP] Fix for a bug in jumbled memory access vectorization introduced in r293386.

Updated the path to incorporate review comments.

Feb 27 2017, 10:39 PM
ashahid added inline comments to D30159: [SLP] Fix for a bug in jumbled memory access vectorization introduced in r293386.
Feb 27 2017, 10:36 PM

Feb 23 2017

ashahid updated the diff for D30159: [SLP] Fix for a bug in jumbled memory access vectorization introduced in r293386.

Updated the patch to incorporate the review comments.

Feb 23 2017, 9:30 PM

Feb 21 2017

ashahid added a comment to D30159: [SLP] Fix for a bug in jumbled memory access vectorization introduced in r293386.

This is getting a bit hairy, I think - we have too many valuelists flying around.

How about something like this:
When we sort the vectors, we can also use the sort to generate the corresponding shuffle mask (instead of recomputing it in O(|VL|^2) time in line 2623 - 2634), by sorting pairs of <Value, Index>.

The basic idea to keep the mask computation at the current place was because we wanted to compute the mask, (N^2 loop), only if vectorization is found to be beneficial.

Yes, but we can avoid the O(N^2) computation altogether, right? I mean, doing that doesn't add any cost to the sort. Or am I missing something?

OTOH if we do mask computation while sorting memory accesses, the utility of SortMemAccesses API may be reduced, however this is not a bigger problem and can be
worked around by keeping a bit to decide whether to compute the mask or not.

Don't we sort before we know whether we're going to need the mask?

We can then keep an optional shuffle mask instead of the NeedToShuffle bit as part of the tree node, and use it to get the correctly ordered list of scalars back from VectorizableTree[0].Scalars when we need to.

What do you think? My gut feeling is that keeping the mask around is worth it just to avoid the N^2 loop in 2623, and the fix to the bug falls out of it.

Although I am not an expert, I think keeping another ValueList for unordered scalars is a not a bad design. However I am open to other views.

It seems odd to have both the sorted and the unsorted list. The other thing that bothers me is that we have some invariants which don't seem to be well-defined. Is the guarantee that "Scalars" is ordered to have the "best" order to allow vectorization, while InOrdScalars is ordered according to the original IR order? Also, preserving "program order" is kind of meaningless. The fact we care about order at all is just an artifact of the way we build the tree. What we really care about is mapping each scalar leaf to the vector lane it eventually ends up in, which is precisely the shuffle mask.

Anyway, I'm also open to other opinions, if more people care about this.
What I'm really trying to do here is to make the code make a bit more sense - right now, it makes almost zero sense to me. (This isn't the fault of your patch, I'm talking about the general state of the SLP vectorizer).

Feb 21 2017, 11:48 PM

Feb 20 2017

ashahid added a comment to D30159: [SLP] Fix for a bug in jumbled memory access vectorization introduced in r293386.

This is getting a bit hairy, I think - we have too many valuelists flying around.

How about something like this:
When we sort the vectors, we can also use the sort to generate the corresponding shuffle mask (instead of recomputing it in O(|VL|^2) time in line 2623 - 2634), by sorting pairs of <Value, Index>.

Feb 20 2017, 8:56 PM

Feb 19 2017

ashahid created D30159: [SLP] Fix for a bug in jumbled memory access vectorization introduced in r293386.
Feb 19 2017, 10:24 PM

Jan 31 2017

ashahid added inline comments to D26905: [SLP] Vectorize loads of consecutive memory accesses, accessed in non-consecutive (jumbled) way..
Jan 31 2017, 1:35 AM

Jan 28 2017

ashahid committed rL293386: [SLP] Vectorize loads of consecutive memory accesses, accessed in non….
[SLP] Vectorize loads of consecutive memory accesses, accessed in non…
Jan 28 2017, 10:11 AM
ashahid closed D26905: [SLP] Vectorize loads of consecutive memory accesses, accessed in non-consecutive (jumbled) way. by committing rL293386: [SLP] Vectorize loads of consecutive memory accesses, accessed in non….
Jan 28 2017, 10:10 AM

Jan 27 2017

ashahid updated the diff for D26905: [SLP] Vectorize loads of consecutive memory accesses, accessed in non-consecutive (jumbled) way..

Sorry for the delayed response. Updated the patch to include costing for extra shuffle and minor formatting.

Jan 27 2017, 2:37 AM

Jan 19 2017

ashahid committed rL292581: [SLP] Add a base test for jumbled store.
[SLP] Add a base test for jumbled store
Jan 19 2017, 10:16 PM

Jan 13 2017

ashahid updated the diff for D26905: [SLP] Vectorize loads of consecutive memory accesses, accessed in non-consecutive (jumbled) way..

Updated the patch accordingly.

Jan 13 2017, 2:15 AM