- User Since
- Mar 21 2018, 12:29 PM (237 w, 1 d)
May 16 2022
Hi Sjoerd and JinGu, I remember from your comments on the initial DFAJumpThreading patch, that you guys are interested in the opportunities in SPEC2017 perlbench. I wanted to let you know that this patch introduces a limit on the number of threaded paths per switch. The limit is 200 paths, that means that if a switch has 1,000 potential path candidates, only 200 of them will get threaded, the rest of them will circle back to the switch instruction the usual way. The thing is that perlbench has several switches optimized by DFAJumpThreading, and a couple of them have the number of paths above the limit (in the thousands). Due to the limit, these switches are threaded partially, which might lead to performance regressions.
May 4 2022
No noticeable compile time impact.
Compare the number of switches threaded before and after this patch.
Limit the number of paths explored to contain compile time.
Apr 25 2022
I'm getting compile time numbers, will submit them soon. We're also planning on adding more test cases before the patch is merged.
Code size increase is very small.
Mar 21 2022
Feb 1 2022
Jan 31 2022
Thank you @ollef this LGTM
Dec 23 2021
@niravd Hi Nirav, my team and I discovered a compile time regression due to this patch. I submitted a bug report for it, we would really appreciate if you could take a look at it:
Dec 15 2021
I attach a picture of the CFG illustrating the problem in one of the llvm-test-suite tests that was failing on X86.
Aug 3 2021
Jul 30 2021
Compile time statistics gathered on CTMark (no change, basically):
Jul 29 2021
Jul 27 2021
Jul 26 2021
Rebase on top of the latest main
Jul 23 2021
Jul 22 2021
- Add a test for the minsize attribute
- Test MaxPathDepth with a test case where only a subset of paths is threaded
- Reduce the test case for the cost model by passing the threshold through CLI
- Fix a bug in how the NumTransforms statistic is incremented
- Expand the cost model to include the case for jump tables
Jul 14 2021
- Add optimization remark emitter
- Test for min code size
- Test with blocks that cannot be cloned
- Test with some unsupported instructions
- Test unpredictable switch
- Replace all mentions of FSM with DFA
- Always use the term "Threading Path", instead of interchanging "threading" with "threadable"
Jul 13 2021
Jul 12 2021
Jul 9 2021
Addressed the inline comments, added a co-author.
Jun 24 2021
Jun 16 2021
@efriedma Eli, I believe all your comments have been addressed, please take another look when you get a chance, thanks.
Jun 11 2021
Fix a clang-format warning
Jun 9 2021
Fixed the bug submitted by @dnsampaio, test case added
Jun 7 2021
Mar 12 2020
Addressed the rest of Florian's comments.
Mar 11 2020
Addressed part of Florian's comments.
Mar 10 2020
CFG before the patch, LCSSA broken:
Feb 13 2020
Feb 11 2020
Jeroen, we discovered an assertion in introduceNoAliasWhenCopyGuardIndicesAreCompatible() for the case described below; we would appreciate your input.
Mar 22 2019
Aug 7 2018
For some of the failing ARM/AArch64 tests, I see additional mov-s being performed; for example, in CodeGen/AArch64/and-sink.ll: if you take a look at the assembly after applying the patch, you will see an addtional mov for the trace when %c (w1) equals to zero. I'm not sure how important it is, so I would appreciate some feedback from ARM/AArch64 backend people. Please note that for performance-critical atomic compare-and-swap operations, performance is unchanged.
Differential is updated with:
Aug 1 2018
All the test failures are mismatches between expected assembly and assembly produced. One exception is CodeGen/AMDGPU/early-if-convert.ll, where the backend fails with an assertion:
@jonpa Jonas, I see a different assembly in one of SystemZ unit tests; my memory of Z is pretty fuzzy, so could you please make sense of the change to see whether it is reasonable?
Jul 31 2018
This patch fails 14 tests; however, it seems that the problem is not in the patch but in flaky test cases. I took a quick look at CodeGen/PowerPC/vsx.ll: the test case fails after not finding copy instructions, which were redundant and expectedly removed by this patch.