Removed adjust stack and stores.
Dec 13 2018
Removed calls from mir tests.
Can you try implementing the other approach first, and then applying this on top of it to show the difference more clearly?
Maintained 80 chars per line, added GCN-LABEL, reduced mir tests.
Dec 12 2018
Can we have a mir test with more than two loads? I want to see a situation where 3 loads are foldable with the same offset, but lowest address is in the middle. I.e.:
Yes, I added two more mir tests called LowestInMiddle and NegativeDistance.
Updated with the reviewer's comments.
Dec 11 2018
Why aren't these matched in the first place? These shouldn't have gotten this far
Dec 10 2018
Nov 1 2018
Oct 31 2018
Oct 2 2018
Can the tests be reduced/made more flexible? E.g., the tests previously used FileCheck variables ( [[FF:s[0-9]+]] .
Sep 28 2018
Sep 25 2018
Sep 17 2018
Thanks, this mostly looks good to me. Looks like this may be running into a serious limitation of the ISel infrastructure with commutativity / associativity, but it makes sense to land this patch without addressing it. I do have one last question.
Sep 14 2018
Updated test checks purely generated by update_llc_test_checks.
Sep 13 2018
Sep 12 2018
Sep 11 2018
Aug 28 2018
Defined the pattern using foldl (I was wrong, foldl does support DAG patterns).
Aug 24 2018
Aug 20 2018
It looks like there are no further comments. In that case, I will go ahead and check it in.
Can you also add tests/support for the negated form? i.e. -S0.u8 * S1.u8 - S0.u8 * S1.u8 - S0.u8 * S1.u8 - S2.u32. I'm not sure how this will canonicalize, but I don't think we do as much as we do with FP negates since we don't have int source modifiers
Aug 17 2018
Aug 14 2018
Added - a testcase with sign_extend_inreg happening on 32bit from 8bit.
- checks for SI and VI
- update_llc_test checks
Aug 9 2018
If all the operations happen in 16bit, the pattern is detected. Added a testcase of that pattern.
Aug 6 2018
But all the types are legal already?
Yes, they are legal types but we have packed instructions that can operation on a pair of 16bits, therefore packed types can be treated as 32bit scalar type.
Supported the transformation using table gen patterns.
Jul 31 2018
- Removed SDValue initialization
- Removed function calls from the testcases.
- Returned SDValue instead of bool+SDValue
Jul 30 2018
Jul 21 2018
Jul 19 2018
Jul 18 2018
Jul 13 2018
Reworded the comment about the flag requirements.
Added fast-math flag+allow-contract flag and more test-cases.
Jul 12 2018
Jul 10 2018
By the way, since types are being mixed, shouldn't the summary say something like optimize fma((float)S0.x, (float)S1.x, fma((float)S0.y, (float)S1.y, S2)) --> fdot2(S0, S1, S2)? We only want this transformation if S0 and S1 are <2 x f16>.
As far as I understand it should be also legal with -mattr=-fp32-denormals,-fp64-fp16-denormals. I.e. when both 32 and 16 denorms are not supported. Right? Not that is really helps in the real world.
Otherwise it shall be legal if either UnsafeAlgebra or AllowContract flag is set on both FMA nodes.
May 9 2018
Removed the dependency from the destination register.
May 8 2018
May 2 2018
May 1 2018
Added default label.
Apr 30 2018
Removed the amdgpu-slp_vectorizer switch.
Added op_sel clauses.
Addressed other comments.
Apr 27 2018
Apr 25 2018
Thanks Hideki, I will think about your suggestion.
Apr 20 2018
Apr 19 2018
Apr 10 2018
It's the "in general" and "most of the time" qualifiers that raise the red flag for me.
I can see that.
Apr 9 2018
Thanks for your feedback and I agree with you guys.
Apr 6 2018
Apr 3 2018
Mar 9 2018
Mar 8 2018
Enabled ds_read_b128 under a switch and incorporated additional comments.
Mar 7 2018
Mar 6 2018
Mar 2 2018
Thank you guys. My assumption was wrong, I was thinking that each allocation gets 64-dword alignment.
Feb 16 2018
Renamed the instructions to get rid of the numeric values.
Feb 14 2018
Does amdgpu only support gfx6 (si) and above? I thought northern islands was supported by the r600 backend.