User Details
- User Since
- Apr 16 2014, 6:04 PM (492 w, 2 d)
Aug 21 2023
LGTM
Jun 13 2023
May 2 2023
Mar 15 2023
The cases you changed *LGTM* (really good to me!! :-)) I did not check that all cases are covered, but this shouldn't hold up the merge.
Mar 10 2023
LGTM.
Feb 1 2023
Given that this is RISC-V and under a flag, this LGTM. I would like to see stats on the FMA's plus the changes to the cycle counts on the critical path, and see how the data correlate to your measured run-time performance numbers. And ditto for the current heuristic. This might also help understand the wide variety of results in your SPEC data. Your numbers look all over the place. Also, you can probably push your idea more by allowing a parameterized schedule window (eg 10 or 15 instructions) rather than a basic block. This would allow you to catch cases across blocks and should work better for large blocks. Finally, I would not be surprised - just learning from your insights here and guessing - that various in-order processors show best performance for different window sizes. All this is just food for thought for additional/future work though. Cheers!
Jan 17 2023
LGTM for both (GISel & SDAG).
Jan 6 2023
LGTM.
This is a straightforward fix. LGTM.
Jul 7 2022
LGTM!
Dec 23 2021
LGTM.
Sep 27 2021
Agree on the painpoints, but from the user perspective Lang's proposal (2) is a considerate approach to handle this.
Jul 27 2021
Thanks, Tim. LGTM.
Jul 21 2021
Jul 16 2021
LGTM.
Nice cleanup, too. LGTM.
Jul 14 2021
This is straightforward. LGTM.
Mar 25 2021
Mar 10 2021
Nice! This is a straightforward extension with minor code clean-up.
Dec 2 2020
LGTM.
Nov 2 2020
Jul 17 2020
Synced with the LLDB team. LGTM.
May 12 2020
It looks a bit weird to check for FP constant again (see line 2802). The crux is that constant folding for CE fires only when size > 8 on line 2825. If folding happened at this point independent of size, then you would not need your checks in 2840ff. So could the size check be simply removed for the fix?
Feb 13 2020
+ Duncan, Michae for clang.
+ Amara, David for another opinion. Seems straight forward & LGTM.
Especially for inliner changes I suggest a) sharing performance numbers on standard benchmarks with O2 & O2 (thin) LTO, and b) offering a flag to toggle individual changes. Performance here includes run-time performance, compile-time, and code size. A simple change can have detrimental impact on any of key metrics folks might be watching, and that that impact could be different depending on the targets.
Feb 6 2020
Nice. LGTM.
Dec 12 2019
A question and a nit to get it started.
Ups, I removed the original reviewers?! Please add anyone else I missed. I only remember Kristof. Sorry for this nuisance!
Dec 5 2019
LGTM.
Dec 3 2019
I have one suggestion and one more question.
Nov 22 2019
Some food for thought & perhaps you could add a test (or more) for FP types other than double?
Nov 12 2019
Aug 21 2019
Jun 26 2019
+ reviewers to move this forward hopefully
Apr 16 2019
Could you separate size one checks are OK vs. checks that require even powers of 2? That smells like an inconsistency worth calling out explicitly.
Mar 18 2019
Looks reasonable. Added more reviewers for second opinions.
Dec 17 2018
+ Akira, Quentin for their driver / x86_64h experience for a quick double check. Fwiw, LGTM.
Dec 14 2018
This is straightforward way to pass along alignment tags and fixes (at least one critical) bug. LGTM.
Dec 13 2018
LGTM
Dec 12 2018
Oct 23 2018
Jul 24 2018
+ David, Zia
Jun 13 2018
May 8 2018
The test case could possibly be shorter, but the change LGTM.
Feb 22 2018
Yes, this LGTM. I could not find a way to perform updates properly in the example. If recomputing the dependencies results in a compile-time issue we will have to dig deeper into MemDep.
Feb 21 2018
Jan 31 2018
I assume you will take care of the comment. LGTM.
Jan 30 2018
This is very close now. Could you add an explicit examples (eg show the IR) showing which initialization remain slow (Complex) and which are fast now? This should also address the spirit of Adrian's question I think.
Jan 29 2018
I like the spirit of the idea. What made you look into this? Some more questions and suggestions below, but LGTM as is.
Thank you drilling into this! I have a few questions below. Also, could you comment on the time savings you measured for your implementation?
Jan 4 2018
+ Matthias for review & thoughts about the unit test.
Jan 2 2018
Thanks! This LGTM then.
I see all issues that came up in this thread covered by the last patch. Before this can be committed I still want to check that all paths/issues are tested/covered wrt to fast-isel:
a) how do we guarantee that there is no fallback from GISel to FastISel (when GISel is supported)? This is probably a nit since it is obvious to everyone deeper in the implementation then I am.
b) for all the tests where fast-isel was added shouldn't there equivalent tests for GISel, too? Even if the tests target fast-isel specific issues how do we make sure GISel does not expose similar/same bugs?
Dec 5 2017
It looks pretty straightforward, but I'd ask (at least) Duncan or Akira, and Matthias to review this more carefully.
I think your commit makes accounting for both cases - the instructions inserted and the instructions deleted - consistent. From that angle this look OK. I have also some food for thought: in both cases the code makes the assumption that it inserts a dependent instruction chain. Only then it is correct to simply add instruction latencies. I have not checked that this assumption is correct for all cases. So I suggest to add a comment about this. Assuming also that you don't see any notable perf regression you can go ahead and commit from my perspective: LGTM.
Oct 11 2017
Thanks for working on this. LGTM!
Sep 6 2017
LGTM. Just some food for thought for possible follow up commits.
Sep 5 2017
I added a few minor remarks that you might want to consider before commit. Again, thanks for working on this and your patience!
Aug 25 2017
Thank you for sharing your numbers. I was hoping you had that show-off case.
I just have a few minor comments about readability and asserts. I still need bit more time to convince myself that all the pieces fit together. I was wondering if you could add a test that dumps the depths w/ and w/o the incremental updates (compile w/ full comination + dumps depths, compile with a threashold of 1 + dump depths, diff the dump)?
Aug 21 2017
Hi Florian
Jul 27 2017
Jul 19 2017
It is not clear to me if and how the original questions have been answered by this patch yet. Could you elaborate and add comments, please? Much appreciate!
Jul 18 2017
Thanks for working on this! Please add comments and show off one or two examples that highlight the benefits of your extensions.
Jul 12 2017
Jul 6 2017
This is interesting. Unfortunately these heuristic changes are always tricky and never satisfying. Is this change motivated entirely by limitations of the current shrink-wrapping algorithm or do you see gains from better allocation also.
Jun 29 2017
Just a first superficial review. I haven't thought about the underlying concept itself yet.
May 11 2017
LGTM.
May 2 2017
Thanks, Kyle!
Apr 20 2017
I don't know if this patch makes the situation better or worse. It seems to touch on the tip of an iceberg. The underlying problem is: What can or should happen when a loop is irreducible? It the answer is "don't unroll", then this is the fix your actually looking for. Obviously this loop is irreducible since it has a retreat edge from sink to body2, but body2 does not dominate sink. I'm also curious if this loop is in the source code already or if some pass in the compiler actually generated. FWIW, unless the compiler has a problem irreducible loops should be rare. And when the user writes an irreducible loop, I think it is OK when compiler optimizations turn conservative.
Apr 16 2017
I would like to understand this better. Could somebody explain what assumptions SCEV makes about its clients? Which assumption(s) is broken by SLP? It seems to me that this issue potentially touches fundamental design decisions/questions and I don't see any verifiers in place.
Apr 12 2017
Please double check your code one more time. Thank you!
Mar 28 2017
I'd love to see some comments an get your thoughts about a verifier.
Feb 9 2017
Thanks for following up on this.
Jan 26 2017
I think the only issue that needs to be addressed is (finally!) sharing perf data. This has been raised at least 3 times. The possible compile-time implication, the speciality of the application (fast-math) etc are well understood.
Jan 24 2017
Hans, you were right. The computeKnowBits etc parts should have been moved also. Sanjay committed the fix proper in r289442. I only kept the regression test in narrow-switch.ll (r293018).
Jan 18 2017
I'm leaning towards a LGTM since you addressed basically all my issues, but more people mushroomed and are curious about your performance data. So I think can't dodge that question anymore and need to share some good data for the benchmark(s) you are tuning for before you get the nod.
Jan 2 2017
From my perspective the implementation is close and only requires a few minor changes.
Dec 22 2016
The "automatic" generation of pattern e.g. with TableGen is on my longer term wish list, not a requirement for this patch. Sorry if my wording was confusing.
Do you have performance numbers?
Dec 21 2016
A few general remarks:
- I'm very much in favor of the MC combiner approach
- But I'm getting increasingly concerned about the MC code quality. I felt the FMA/Aarch64 support starts looking crabby (I'm the culprit) and this doesn't look much better I'm afraid. The approach is in need of more automation for better maintainability. This is meant as food for thought. Since I"m too blame for the approach I can't to be harsh in reviews :-)
- I'm also concerned about the compile-time (in particular since we don't track x86 specific issues ( or any other backend than Aarch64- or least I'm not aware that anyone is watching closely). Could you share some specific data about your perf gains and compile-time measurements? However, I think this optimization is for fast math only and compile-time is probably less of an issue in that mode. One way to deal with compile-time issues is to wrap some MC under an option.
- Perhaps I missed it but I expected the optimization to kick in only under fast math. I saw 'fast' in the test cases, but didn't see a check in the code.
Dec 20 2016
Thanks! LGTM.
Dec 13 2016
Dec 12 2016
Dec 9 2016
Michael, could you also lend this your expert eye? Do you agree with the extra memory (pair vs vector) and compile-time (look at the loop tree) investment for this fix? Shortening the types may not be worth it at least on some architectures.
Dec 8 2016
Good catch. LGTM.
Dec 1 2016
Nov 18 2016
The Oz case looks interesting. Can you share more details/insights about the "inaccuracies" w/ specific examples? I'm wondering if that can be fixed in general or be more triggered towards some trunk characteristics. But this is just something we can think about and discuss while moving on and celebrate the ct/cs recoveries :-). So LGTM!
Nov 17 2016
Nov 14 2016
Thanks for following up!
LGTM
Nov 1 2016
I thought I understand this until about the middle of the review. Now I could use some help perhaps with variable names and comments that reflect more clearly on the expression(s) you simplify. I think this is what Renato is looking for, too.