Page MenuHomePhabricator

FarhanaAleen (Farhana Aleen)
User

Projects

User does not belong to any projects.

User Details

User Since
Jan 15 2018, 8:33 AM (83 w, 3 d)

Recent Activity

Dec 13 2018

FarhanaAleen updated the diff for D55539: [AMDGPU] Promote constant offset to the immediate by finding a new base with 13bit constant offset from the nearby instructions. .

Removed adjust stack and stores.

Dec 13 2018, 11:09 PM
FarhanaAleen updated the diff for D55539: [AMDGPU] Promote constant offset to the immediate by finding a new base with 13bit constant offset from the nearby instructions. .

Removed calls from mir tests.

Dec 13 2018, 10:37 PM
FarhanaAleen added a comment to D55539: [AMDGPU] Promote constant offset to the immediate by finding a new base with 13bit constant offset from the nearby instructions. .

Can you try implementing the other approach first, and then applying this on top of it to show the difference more clearly?

Dec 13 2018, 3:29 PM
FarhanaAleen updated the diff for D55539: [AMDGPU] Promote constant offset to the immediate by finding a new base with 13bit constant offset from the nearby instructions. .

Maintained 80 chars per line, added GCN-LABEL, reduced mir tests.

Dec 13 2018, 3:28 PM

Dec 12 2018

FarhanaAleen added a comment to D55539: [AMDGPU] Promote constant offset to the immediate by finding a new base with 13bit constant offset from the nearby instructions. .

Can we have a mir test with more than two loads? I want to see a situation where 3 loads are foldable with the same offset, but lowest address is in the middle. I.e.:

Yes, I added two more mir tests called LowestInMiddle and NegativeDistance.

Dec 12 2018, 4:03 PM
FarhanaAleen updated the diff for D55539: [AMDGPU] Promote constant offset to the immediate by finding a new base with 13bit constant offset from the nearby instructions. .

Updated with the reviewer's comments.

Dec 12 2018, 4:03 PM
FarhanaAleen retitled D55539: [AMDGPU] Promote constant offset to the immediate by finding a new base with 13bit constant offset from the nearby instructions. from [AMDGPU] Promote offset to immediate. to [AMDGPU] Promote constant offset to the immediate by finding a new base with 13bit constant offset from the nearby instructions. .
Dec 12 2018, 2:36 PM

Dec 11 2018

FarhanaAleen added a comment to D55539: [AMDGPU] Promote constant offset to the immediate by finding a new base with 13bit constant offset from the nearby instructions. .

Why aren't these matched in the first place? These shouldn't have gotten this far

Dec 11 2018, 8:24 AM

Dec 10 2018

FarhanaAleen created D55539: [AMDGPU] Promote constant offset to the immediate by finding a new base with 13bit constant offset from the nearby instructions. .
Dec 10 2018, 6:20 PM

Nov 1 2018

FarhanaAleen added inline comments to D53937: [AMDGPU] Handle the idot8 pattern generated by FE.
Nov 1 2018, 10:02 AM

Oct 31 2018

FarhanaAleen created D53937: [AMDGPU] Handle the idot8 pattern generated by FE.
Oct 31 2018, 9:06 AM

Oct 2 2018

FarhanaAleen added a comment to D52520: [AMDGPU] Match signed dot4/8 pattern..

Can the tests be reduced/made more flexible? E.g., the tests previously used FileCheck variables ( [[FF:s[0-9]+]] .

Oct 2 2018, 9:48 AM

Sep 28 2018

FarhanaAleen added a reviewer for D52520: [AMDGPU] Match signed dot4/8 pattern.: msearles.
Sep 28 2018, 9:15 AM

Sep 25 2018

FarhanaAleen created D52520: [AMDGPU] Match signed dot4/8 pattern..
Sep 25 2018, 2:47 PM

Sep 17 2018

FarhanaAleen added a comment to D51947: [AMDGPU] Match udot8 pattern.

Thanks, this mostly looks good to me. Looks like this may be running into a serious limitation of the ISel infrastructure with commutativity / associativity, but it makes sense to land this patch without addressing it. I do have one last question.

Sep 17 2018, 9:14 AM

Sep 14 2018

FarhanaAleen updated the diff for D51947: [AMDGPU] Match udot8 pattern.

Updated test checks purely generated by update_llc_test_checks.

Sep 14 2018, 11:40 AM

Sep 13 2018

FarhanaAleen added inline comments to D51947: [AMDGPU] Match udot8 pattern.
Sep 13 2018, 12:25 PM

Sep 12 2018

FarhanaAleen updated the diff for D51947: [AMDGPU] Match udot8 pattern.

Thanks Nicolai.

Sep 12 2018, 4:22 PM

Sep 11 2018

FarhanaAleen created D51947: [AMDGPU] Match udot8 pattern.
Sep 11 2018, 1:41 PM

Aug 28 2018

FarhanaAleen updated the diff for D50921: [AMDGPU] Match udot4 pattern..

Defined the pattern using foldl (I was wrong, foldl does support DAG patterns).

Aug 28 2018, 9:10 AM

Aug 24 2018

FarhanaAleen added inline comments to D50921: [AMDGPU] Match udot4 pattern..
Aug 24 2018, 3:36 PM

Aug 20 2018

FarhanaAleen added a comment to D50024: [AMDGPU] Support idot2 pattern..

It looks like there are no further comments. In that case, I will go ahead and check it in.

Aug 20 2018, 3:15 PM
FarhanaAleen added a comment to D50921: [AMDGPU] Match udot4 pattern..

Can you also add tests/support for the negated form? i.e. -S0.u8[1] * S1.u8[1] - S0.u8[2] * S1.u8[2] - S0.u8[3] * S1.u8[3] - S2.u32. I'm not sure how this will canonicalize, but I don't think we do as much as we do with FP negates since we don't have int source modifiers

Aug 20 2018, 2:26 PM

Aug 17 2018

FarhanaAleen created D50921: [AMDGPU] Match udot4 pattern..
Aug 17 2018, 1:47 PM

Aug 14 2018

FarhanaAleen added inline comments to D50024: [AMDGPU] Support idot2 pattern..
Aug 14 2018, 11:20 AM
FarhanaAleen updated the diff for D50024: [AMDGPU] Support idot2 pattern..

Added - a testcase with sign_extend_inreg happening on 32bit from 8bit.

  • checks for SI and VI
  • update_llc_test checks
Aug 14 2018, 11:19 AM

Aug 9 2018

FarhanaAleen updated the diff for D50024: [AMDGPU] Support idot2 pattern..

If all the operations happen in 16bit, the pattern is detected. Added a testcase of that pattern.

Aug 9 2018, 10:43 AM

Aug 6 2018

FarhanaAleen added a comment to D50024: [AMDGPU] Support idot2 pattern..

But all the types are legal already?

Yes, they are legal types but we have packed instructions that can operation on a pair of 16bits, therefore packed types can be treated as 32bit scalar type.

Aug 6 2018, 10:54 AM
FarhanaAleen updated the diff for D50024: [AMDGPU] Support idot2 pattern..

Supported the transformation using table gen patterns.

Aug 6 2018, 10:26 AM

Jul 31 2018

FarhanaAleen updated the diff for D50024: [AMDGPU] Support idot2 pattern..
  • Removed SDValue initialization
  • Removed function calls from the testcases.
  • Returned SDValue instead of bool+SDValue
Jul 31 2018, 12:47 PM
FarhanaAleen added a comment to D50024: [AMDGPU] Support idot2 pattern..

Thanks Matt.

Jul 31 2018, 12:45 PM

Jul 30 2018

FarhanaAleen created D50024: [AMDGPU] Support idot2 pattern..
Jul 30 2018, 3:47 PM

Jul 21 2018

FarhanaAleen added inline comments to D49516: [LoadStoreVectorizer] Use getMinusScev() to compute the distance between two pointers..
Jul 21 2018, 4:19 PM

Jul 19 2018

FarhanaAleen added inline comments to D49516: [LoadStoreVectorizer] Use getMinusScev() to compute the distance between two pointers..
Jul 19 2018, 1:56 PM

Jul 18 2018

FarhanaAleen created D49516: [LoadStoreVectorizer] Use getMinusScev() to compute the distance between two pointers..
Jul 18 2018, 4:06 PM

Jul 13 2018

FarhanaAleen added inline comments to D49146: [AMDGPU] Support a fdot2 pattern..
Jul 13 2018, 3:03 PM
FarhanaAleen updated the diff for D49146: [AMDGPU] Support a fdot2 pattern..
Jul 13 2018, 3:03 PM
FarhanaAleen updated the diff for D49146: [AMDGPU] Support a fdot2 pattern..

Reworded the comment about the flag requirements.

Jul 13 2018, 10:26 AM
FarhanaAleen updated the diff for D49146: [AMDGPU] Support a fdot2 pattern..

Added fast-math flag+allow-contract flag and more test-cases.

Jul 13 2018, 10:02 AM

Jul 12 2018

FarhanaAleen added inline comments to D49146: [AMDGPU] Support a fdot2 pattern..
Jul 12 2018, 11:20 AM

Jul 10 2018

FarhanaAleen added a comment to D49146: [AMDGPU] Support a fdot2 pattern..

By the way, since types are being mixed, shouldn't the summary say something like optimize fma((float)S0.x, (float)S1.x, fma((float)S0.y, (float)S1.y, S2)) --> fdot2(S0, S1, S2)? We only want this transformation if S0 and S1 are <2 x f16>.

Jul 10 2018, 12:37 PM
FarhanaAleen added a comment to D49146: [AMDGPU] Support a fdot2 pattern..

As far as I understand it should be also legal with -mattr=-fp32-denormals,-fp64-fp16-denormals. I.e. when both 32 and 16 denorms are not supported. Right? Not that is really helps in the real world.
Otherwise it shall be legal if either UnsafeAlgebra or AllowContract flag is set on both FMA nodes.

Jul 10 2018, 12:03 PM
FarhanaAleen created D49146: [AMDGPU] Support a fdot2 pattern..
Jul 10 2018, 9:36 AM

May 9 2018

FarhanaAleen updated the diff for D46604: [AMDGPU] Support horizontal vectorization of min/max..

Removed the dependency from the destination register.

May 9 2018, 1:06 PM
FarhanaAleen added a reviewer for D46604: [AMDGPU] Support horizontal vectorization of min/max.: rampitec.
May 9 2018, 11:57 AM
FarhanaAleen removed a reviewer for D46604: [AMDGPU] Support horizontal vectorization of min/max.: rampitec.
May 9 2018, 10:12 AM
FarhanaAleen retitled D46604: [AMDGPU] Support horizontal vectorization of min/max. from [AMDGPU] Support horizontal vectorization on min/max. to [AMDGPU] Support horizontal vectorization of min/max..
May 9 2018, 10:11 AM
FarhanaAleen changed the visibility for D46604: [AMDGPU] Support horizontal vectorization of min/max..
May 9 2018, 9:39 AM

May 8 2018

FarhanaAleen created D46604: [AMDGPU] Support horizontal vectorization of min/max..
May 8 2018, 2:54 PM

May 2 2018

FarhanaAleen updated subscribers of D46337: [AMDGPU] performAddCombine should run after DAG is legalized..
May 2 2018, 9:25 AM

May 1 2018

FarhanaAleen created D46337: [AMDGPU] performAddCombine should run after DAG is legalized..
May 1 2018, 4:11 PM
FarhanaAleen updated subscribers of D46213: [AMDGPU] Support horizontal vectorization..
May 1 2018, 2:22 PM
FarhanaAleen updated the diff for D46213: [AMDGPU] Support horizontal vectorization..

Added default label.

May 1 2018, 1:33 PM

Apr 30 2018

FarhanaAleen updated the diff for D46213: [AMDGPU] Support horizontal vectorization..

Added {{$}}.
Removed the amdgpu-slp_vectorizer switch.

Apr 30 2018, 1:09 PM
FarhanaAleen added inline comments to D46213: [AMDGPU] Support horizontal vectorization..
Apr 30 2018, 10:46 AM
FarhanaAleen updated the diff for D46213: [AMDGPU] Support horizontal vectorization..

Added op_sel clauses.
Addressed other comments.

Apr 30 2018, 10:46 AM

Apr 27 2018

FarhanaAleen created D46213: [AMDGPU] Support horizontal vectorization..
Apr 27 2018, 2:53 PM

Apr 25 2018

FarhanaAleen abandoned D45834: [TTI] Add a hook to TTI for choosing scalarized shuffle-reduction sequence for reduction idiom.

Thanks Hideki, I will think about your suggestion.

Apr 25 2018, 2:41 PM

Apr 20 2018

FarhanaAleen updated the diff for D45834: [TTI] Add a hook to TTI for choosing scalarized shuffle-reduction sequence for reduction idiom.

Hi Hideki.

Apr 20 2018, 5:04 PM

Apr 19 2018

FarhanaAleen created D45834: [TTI] Add a hook to TTI for choosing scalarized shuffle-reduction sequence for reduction idiom.
Apr 19 2018, 12:08 PM

Apr 10 2018

FarhanaAleen abandoned D45393: [InstCombine] Scalarize binary ops following shuffles..
Apr 10 2018, 3:42 PM
FarhanaAleen added a comment to D45393: [InstCombine] Scalarize binary ops following shuffles..

It's the "in general" and "most of the time" qualifiers that raise the red flag for me.

I can see that.

Apr 10 2018, 9:07 AM

Apr 9 2018

FarhanaAleen added a comment to D45393: [InstCombine] Scalarize binary ops following shuffles..

Thanks for your feedback and I agree with you guys.

Apr 9 2018, 1:43 PM

Apr 6 2018

FarhanaAleen created D45393: [InstCombine] Scalarize binary ops following shuffles..
Apr 6 2018, 3:38 PM

Apr 3 2018

FarhanaAleen created D45219: [AMDGPU] performMinMaxCombine should not optimize patterns of vectors to min3/max3.
Apr 3 2018, 11:31 AM

Mar 9 2018

FarhanaAleen created D44319: [AMDGPU]Supported ds_write_b128 generation..
Mar 9 2018, 10:59 AM
FarhanaAleen added inline comments to D44210: [AMDGPU] Supported ds_read_b128 generation; Widened vector length for local address-space.
Mar 9 2018, 8:42 AM
FarhanaAleen updated the diff for D44210: [AMDGPU] Supported ds_read_b128 generation; Widened vector length for local address-space.
Mar 9 2018, 8:42 AM

Mar 8 2018

FarhanaAleen updated the diff for D44210: [AMDGPU] Supported ds_read_b128 generation; Widened vector length for local address-space.

Enabled ds_read_b128 under a switch and incorporated additional comments.

Mar 8 2018, 4:12 PM

Mar 7 2018

FarhanaAleen created D44210: [AMDGPU] Supported ds_read_b128 generation; Widened vector length for local address-space.
Mar 7 2018, 8:06 AM

Mar 6 2018

FarhanaAleen created D44179: [AMDGPU] Widened vector length for global/constant address space..
Mar 6 2018, 4:11 PM

Mar 2 2018

FarhanaAleen abandoned D44045: [AMDGPU] Adjusted alignment-check for local address space; .
Mar 2 2018, 6:38 PM
FarhanaAleen added a comment to D44045: [AMDGPU] Adjusted alignment-check for local address space; .

Thank you guys. My assumption was wrong, I was thinking that each allocation gets 64-dword alignment.

Mar 2 2018, 6:31 PM
FarhanaAleen created D44045: [AMDGPU] Adjusted alignment-check for local address space; .
Mar 2 2018, 2:27 PM

Feb 16 2018

FarhanaAleen updated the diff for D43275: [AMDGPU]Increased vector length for global/constant loads. .

Renamed the instructions to get rid of the numeric values.

Feb 16 2018, 2:42 PM

Feb 14 2018

FarhanaAleen added a comment to D43275: [AMDGPU]Increased vector length for global/constant loads. .

Does amdgpu only support gfx6 (si) and above? I thought northern islands was supported by the r600 backend.

Feb 14 2018, 8:29 AM

Feb 13 2018

FarhanaAleen created D43275: [AMDGPU]Increased vector length for global/constant loads. .
Feb 13 2018, 8:10 PM